task icon Task

Analyze A/B Test Results

Requirements
CSV file with experiment data. Expected columns: variant assignment, primary metric, optional secondary metrics, user/session identifier.
1

Ask the user to upload their experiment results CSV file.

Expected data:

  • Variant assignment (required) — control/treatment or A/B/C
  • Primary metric (required) — conversion, revenue, engagement, etc.
  • User/session identifier (required) — for counting
  • Secondary metrics (optional) — other outcomes tracked
  • Segment info (optional) — for heterogeneous effects

Get context:

  • What was the hypothesis?
  • What change was tested?
  • What was the target lift?
  • How long did it run?
6

Review the validation output. If there are warnings about sample size or missing
columns, note these—they'll affect confidence in the results.

Interpret the parsed CSV columns semantically using the interpretation guide.
Identify:

  • Variant column (e.g., "variant", "group", "treatment", "bucket")
  • Primary metric column (e.g., "converted", "revenue", "clicks")
  • Metric type: "binary" for conversion rates (0/1), "continuous" for revenue/time
  • Any segment columns for heterogeneous analysis
10

Review the statistical analysis output.

The code has calculated:

  • Sample sizes per variant
  • Conversion rates or means
  • Relative lift (%)
  • Z-score or t-score
  • P-value
  • 95% confidence interval
  • Significance determination
  • Preliminary recommendation

Now provide interpretation:

  1. Translate the statistics — What do the numbers mean in plain language?

  2. Assess practical significance — Is the lift large enough to matter?

    • For conversion: >5% relative lift is usually meaningful
    • For revenue: depends on absolute dollars
  3. Check secondary metrics — Did they move in expected directions?

  4. Segment analysis — If segment data exists, did effects vary?

    • Rerun stats on filtered data if needed
    • Watch for Simpson's paradox
  5. Confidence assessment — How certain should we be?

    • Sample size adequate?
    • Test ran long enough?
    • Any data quality warnings?
11

Present the analysis using the A/B Test Results template from the interpretation guide:

# A/B Test Results: [Test Name]

## Summary
**Result:** [SHIP / DO NOT SHIP / INCONCLUSIVE]
**Primary metric:** [Metric] — [Variant value] vs [Control value]
**Lift:** [+/-X%] (95% CI: [lower] to [upper])
**Statistical significance:** [Yes/No] (p = [value])

## Detailed Results

| Variant | Sample Size | [Primary Metric] | Lift vs Control | Significant? |
|---------|-------------|------------------|-----------------|--------------|
| Control | [N] | [Value] | — | — |
| [Variant] | [N] | [Value] | [+/-X%] | [Yes/No] |

## Sample Size Assessment
[From validation output]

## Secondary Metrics
[If applicable]

## Segment Analysis
[If applicable]

## Interpretation
[Statistical + practical significance assessment]

## Recommendation
**Decision:** [Ship / Don't ship / Iterate / Extend]
**Rationale:** [Why]
**Risks:** [If shipping]

## Next Steps
1. [Action]
2. [Follow-up experiment]
12

After presenting analysis:

  • Ask if any results seem surprising given their hypothesis
  • Offer to dig deeper on segment effects
  • Discuss what the next experiment might be
  • If they want to design a follow-up experiment, refer to Product Strategy Advisor

Session files uiParsed Product Data and uiA/B Test Statistics will be
cleaned up automatically at end of session.

                    To run this task you must have the following required information:

> CSV file with experiment data. Expected columns: variant assignment, primary metric, optional secondary metrics, user/session identifier.

If you don't have all of this information, exit here and respond asking for any extra information you require, and instructions to run this task again with ALL required information.

---

You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple read steps in a row, read them all at once (in parallel).

Add all steps to your todo list now and begin executing.

## Steps

1. Ask the user to upload their experiment results CSV file.

Expected data:
- Variant assignment (required) — control/treatment or A/B/C
- Primary metric (required) — conversion, revenue, engagement, etc.
- User/session identifier (required) — for counting
- Secondary metrics (optional) — other outcomes tracked
- Segment info (optional) — for heterogeneous effects

Get context:
- What was the hypothesis?
- What change was tested?
- What was the target lift?
- How long did it run?


2. [Gather Arguments: Parse CSV] The next step has the following requirements for arguments, do not proceed until you have all the required information:
- `inputPath`: path to the uploaded CSV from user
- `outputPath`: output path from ui:session.product.data
- `hasHeaders` (default: "true") - Whether first row is headers: true, false
- `delimiter` - Field delimiter (auto-detected if empty)
- Packages: papaparse

3. [Run Code: Parse CSV]: Call `run_script` with:

```json
{
  "file": {
    "path": https://sk.ills.app/code/stdlib.csv.parse/preview,
    "args": [
      "inputPath",
      "outputPath",
      "hasHeaders",
      "delimiter"
    ]
  },
  "packages": ["papaparse"]
}
```

4. [Gather Arguments: Validate Product Data] The next step has the following requirements for arguments, do not proceed until you have all the required information:
- `inputPath`: output path from ui:session.product.data
- `requiredColumns`: variant,metric
- `minRows` (default: "20"): 30
- `analysisType` (default: "general"): abtest

5. [Run Code: Validate Product Data]: Call `run_script` with:

```json
{
  "file": {
    "path": https://sk.ills.app/code/product.data.validate/preview,
    "args": [
      "inputPath",
      "requiredColumns",
      "minRows",
      "analysisType"
    ]
  },
  "packages": null
}
```

6. [Read CSV Column Interpretation Guide]: Read the documentation in: `./skills/sauna/[skill_id]/references/stdlib.csv.interpretation.md` (Semantic column interpretation guidance)

7. [Read Parsed Product Data]: Read the file at `./documents/tmp/product-data.json` and analyze its contents (Load the parsed data)

8. Review the validation output. If there are warnings about sample size or missing
columns, note these—they'll affect confidence in the results.

Interpret the parsed CSV columns semantically using the interpretation guide.
Identify:
- Variant column (e.g., "variant", "group", "treatment", "bucket")
- Primary metric column (e.g., "converted", "revenue", "clicks")
- Metric type: "binary" for conversion rates (0/1), "continuous" for revenue/time
- Any segment columns for heterogeneous analysis


9. [Gather Arguments: Calculate Statistical Significance] The next step has the following requirements for arguments, do not proceed until you have all the required information:
- `inputPath`: output path from ui:session.product.data
- `outputPath`: output path from ui:session.abtest.stats
- `variantColumn`: identified variant column name
- `metricColumn`: identified primary metric column name
- `metricType` (default: "binary"): binary or continuous based on metric type
- `confidenceLevel` (default: "0.95"): 0.95

10. [Run Code: Calculate Statistical Significance]: Call `run_script` with:

```json
{
  "file": {
    "path": https://sk.ills.app/code/product.stats.significance/preview,
    "args": [
      "inputPath",
      "outputPath",
      "variantColumn",
      "metricColumn",
      "metricType",
      "confidenceLevel"
    ]
  },
  "packages": null
}
```

11. [Read A/B Test Interpretation Guide]: Read the documentation in: `./skills/sauna/[skill_id]/references/product.abtest.guide.md` (Framework for interpreting statistical results)

12. [Read A/B Test Statistics]: Read the file at `./documents/tmp/abtest-stats.json` and analyze its contents (Load the statistical analysis results)

13. Review the statistical analysis output.

The code has calculated:
- Sample sizes per variant
- Conversion rates or means
- Relative lift (%)
- Z-score or t-score
- P-value
- 95% confidence interval
- Significance determination
- Preliminary recommendation

Now provide interpretation:

1. **Translate the statistics** — What do the numbers mean in plain language?

2. **Assess practical significance** — Is the lift large enough to matter?
   - For conversion: >5% relative lift is usually meaningful
   - For revenue: depends on absolute dollars

3. **Check secondary metrics** — Did they move in expected directions?

4. **Segment analysis** — If segment data exists, did effects vary?
   - Rerun stats on filtered data if needed
   - Watch for Simpson's paradox

5. **Confidence assessment** — How certain should we be?
   - Sample size adequate?
   - Test ran long enough?
   - Any data quality warnings?


14. Present the analysis using the A/B Test Results template from the interpretation guide:

```markdown
# A/B Test Results: [Test Name]

## Summary
**Result:** [SHIP / DO NOT SHIP / INCONCLUSIVE]
**Primary metric:** [Metric] — [Variant value] vs [Control value]
**Lift:** [+/-X%] (95% CI: [lower] to [upper])
**Statistical significance:** [Yes/No] (p = [value])

## Detailed Results

| Variant | Sample Size | [Primary Metric] | Lift vs Control | Significant? |
|---------|-------------|------------------|-----------------|--------------|
| Control | [N] | [Value] | — | — |
| [Variant] | [N] | [Value] | [+/-X%] | [Yes/No] |

## Sample Size Assessment
[From validation output]

## Secondary Metrics
[If applicable]

## Segment Analysis
[If applicable]

## Interpretation
[Statistical + practical significance assessment]

## Recommendation
**Decision:** [Ship / Don't ship / Iterate / Extend]
**Rationale:** [Why]
**Risks:** [If shipping]

## Next Steps
1. [Action]
2. [Follow-up experiment]
```


15. After presenting analysis:
- Ask if any results seem surprising given their hypothesis
- Offer to dig deeper on segment effects
- Discuss what the next experiment might be
- If they want to design a follow-up experiment, refer to Product Strategy Advisor

Session files `./documents/tmp/product-data.json` and `./documents/tmp/abtest-stats.json` will be
cleaned up automatically at end of session.