Skip to content

AI Agent Workflows

Stop Asking AI to Critically Self-Check

Why AI reviewers always find something, and the safer replacement: rubric-based review where PASS_NO_CHANGE is a valid outcome.

May 21, 2026 · Dominic Hückmann

Short Answer

Open-ended instructions like “critically self-check this” accidentally reward the model for producing criticism. The fix is not less review. It is calibrated review: explicit criteria, PASS_NO_CHANGE, evidence per finding, severity thresholds, and a tiny change budget.

The short version

“Critically self-check this” sounds responsible.

In practice, it often creates a tiny reward hack: the AI acts like a useful reviewer by finding something to change.

That does not mean self-review is useless. Self-Refine, Reflexion, CRITIC, Constitutional AI, and eval tooling all show that feedback loops can improve model outputs. The failure is more specific: open-ended critique has no stopping rule.

The better instruction is:

Evaluate against this rubric.
PASS_NO_CHANGE is valid.
Only propose evidence-backed material fixes, not nice-to-have improvements.
1
valid PASS state
0
style fixes without evidence
3
maximum material findings

Why “be critical” breaks

The phrase carries a hidden premise: there are flaws to find.

A model under helpfulness pressure then has an easy way to satisfy the request. It can list plausible improvements. More detail feels more useful. A rewrite feels like progress. But none of that proves the artifact was below the bar.

Open-ended critique vs. calibrated review

Weak review prompt

  • “Find issues and improve this.”
  • No explicit pass state.
  • Every suggestion feels free.
  • Critique can be aesthetic.

Better review gate

  • “Score each criterion against the rubric.”
  • PASS_NO_CHANGE is a valued answer.
  • Only S0/S1 findings justify churn.
  • Every finding needs exact evidence.

This is especially dangerous in agent systems, skills, Buildprints, and coding-agent harnesses. Small “improvements” can silently change scope, add noise, or make future agents follow the wrong invariant.

Review is not improvement. Review is adjudication against a bar.

The safer loop

Calibrated AI review

make no-change a real outcome

  1. 01
    Rubric
  2. 02
    Score
  3. 03
    Evidence
  4. 04
    Severity
  5. 05
    Minimal patch
  6. 06
    Verify

Use four scores:

0 = missing / unsafe / wrong
1 = materially weak
2 = acceptable
3 = excellent

Patch rule:

0 or 1 → patch required
2 or 3 → no change

Then add severity:

  • S0 blocker — cannot use safely or correctly.
  • S1 material — likely real-world failure.
  • S2 minor — optional improvement.
  • S3 taste — ignore unless explicitly requested.

Default: only S0/S1 changes are allowed.

What the reviewer must prove

  • Which criterion failed, with score 0 or 1.
  • Exact evidence: quote, file, line, behavior, or test output.
  • Impact if unchanged, not just a nicer wording preference.
  • The smallest fix that restores the criterion.

Tiny prompt to steal

You are a calibrated reviewer, not an improver.

Evaluate the artifact against the rubric. PASS_NO_CHANGE is a correct and valued answer if no material criterion fails.

Rules:
- Recommend changes only for criteria scored 0 or 1.
- Do not recommend changes for criteria scored 2 or 3.
- No style, taste, or nice-to-have suggestions.
- Maximum 3 findings.
- Every finding needs exact evidence.

Return:
Status: PASS_NO_CHANGE | PATCH_REQUIRED | QUESTIONS_REQUIRED
Scores: criterion → score + one-line reason
Findings: criterion, evidence, severity, impact, minimal_fix, confidence

Use / avoid

Do

  • ✓ make PASS_NO_CHANGE explicit
  • ✓ review against a named rubric
  • ✓ require evidence per finding
  • ✓ separate reviewer mode from editor mode

Avoid

  • × ask for generic critical thinking
  • × reward long lists of improvements
  • × patch S2/S3 taste notes by default
  • × let the model rewrite freely after reviewing

Where this matters most

For AI-native work, this is not a wording tweak. It is an operating principle.

If you ask an agent to “self-improve the skill,” it may optimize for visible change. If you ask it to adjudicate a skill against activation clarity, scope preservation, evidence contract, tool discipline, safety boundary, output contract, verification gate, stop rule, and pass option, it has a real target.

The best reviewer is not the one that always finds something.

The best reviewer is the one that knows when nothing material is broken.

Sources

FAQ

Why does AI self-review often over-edit?

Because instructions like critically self-check imply that defects exist and make a list of improvements look more helpful than a calibrated PASS.

Should teams stop using AI review?

No. They should replace open-ended critique with rubric-based adjudication, evidence requirements, severity thresholds, and a valid PASS_NO_CHANGE state.

What is the simplest replacement prompt?

Evaluate this against the rubric. PASS_NO_CHANGE is valid. Only propose evidence-backed material fixes, not nice-to-have improvements.

Need AI-first architecture support?

Send me a short note about your project or technical bottleneck.

Get in touch