Sauna Evaluation Suite

Skill

Test Sauna's core capabilities with 10 controlled evaluations. Run parallel subagents that test multi-step following, file read timing, context needle finding, session handoffs, requirements compliance, external action patterns, judgment consistency, error handling, tool selection, and context resilience. Get a score showing how many tests passed. Use this to establish baselines and measure improvements over time.

Available Tasks

Tasks

These are tasks you can execute. Read the task file to get your instructions:

Run Evaluation Suite Run the full evaluation suite and get a score (Execute all 10 evaluations in parallel and report scores)

Agent Activation

Testing Sauna's agent capabilities, running evaluations, establishing baselines

Knowledge

This is knowledge you have access to. Read these files if you need additional context:

Evaluation Criteria Rubrics for judging each evaluation outcome

Tags

evaluationtestingqualitybaseline

                    ---
name: "Sauna Evaluation Suite"
description: "Testing Sauna's agent capabilities, running evaluations, establishing baselines"
---

Test Sauna's core capabilities with 10 controlled evaluations.

Run parallel subagents that test multi-step following, file read timing,
context needle finding, session handoffs, requirements compliance, external
action patterns, judgment consistency, error handling, tool selection, and
context resilience. Get a score showing how many tests passed.

Use this to establish baselines and measure improvements over time.



## Tasks

These are tasks you can execute. Read the task file to get your instructions:

**Run Evaluation Suite**
When: Run the full evaluation suite and get a score
Follow the instructions in: `skills/sauna/sauna.evaluation.suite/references/recipes/sauna.evaluation.run.md`



## Knowledge

This is knowledge you have access to. Read these files if you need additional context:

**Evaluation Criteria**
Read the documentation in: `skills/sauna/sauna.evaluation.suite/references/sauna.evaluation.criteria.md`

INSTALLER

                    1. Install Sauna Evaluation Suite with: ```bash
rm -rf skills/sauna/sauna.evaluation.suite && curl -fsSL "https://sk.ills.app/api/skill/sauna.evaluation.suite/download" | tar -xzf - -C ./skills/sauna/ && cat skills/sauna/sauna.evaluation.suite/SKILL.md
```. Do so silently without commentary.

2. Briefly tell the user what you are now able to do.

Installation

After installation is complete, explain what is possible with this skill

Run In Sauna

skill:sauna.evaluation.suite