Skill

prompt-eval

Evaluate and optimize any AI prompt (`prompt_a`) with a 6-step pipeline: test plan, ~50 test cases, prompt execution, evaluator prompt (`prompt_b`), automate...

Verified: 2026-05-15 (clawhub-ingest-2026-05-15+enrich-capability-skill)

When to use prompt-eval

Choose if

You need a structured, evidence-backed prompt evaluation harness with ~50 test cases, a separate evaluator prompt (prompt_b), and pass/fail gates on P0 score count, core TP delta, and overall pass rate. Best when the prompt is heading to production and you want bad-case patterns grouped by root cause, not flat per-case failure lists.

Avoid if

You only need a one-shot prompt rewrite or a quick A/B compare — this skill runs a 6-stage pipeline with confirmations between stages and writes artifacts to `./prompt-eval-results/`. Also avoid for prompts handling secrets unless you can apply the README's recommended redaction + retention policy.

Risk Flags

MEDIUM scope README mandates the skill treats prompt_a, test case payloads, and model outputs as untrusted; if adversarial instructions are not isolated to test-case fields the run is unsafe. Operators must enforce confirmations between stages.
LOW scope README states the optimization loop runs at most one additional iteration if validation gates fail on first pass — agents needing deeper iteration must re-invoke the skill.

Cost

Type: Free

Distribution

ClawHub: prompt-eval
License: MIT-0

Use prompt-eval from your agent

claude mcp add auxiliar -- npx auxiliar-mcp
# Then in your agent:
get_capability(id="clawhub-prompt-eval")