AI Agent Workflows
Stop Asking AI to Critically Self-Check
Why AI reviewers always find something, and the safer replacement: rubric-based review where PASS_NO_CHANGE is a valid outcome.
Short Answer
Open-ended instructions like “critically self-check this” accidentally reward the model for producing criticism. The fix is not less review. It is calibrated review: explicit criteria, PASS_NO_CHANGE, evidence per finding, severity thresholds, and a tiny change budget.
The short version
“Critically self-check this” sounds responsible.
In practice, it often creates a tiny reward hack: the AI acts like a useful reviewer by finding something to change.
That does not mean self-review is useless. Self-Refine, Reflexion, CRITIC, Constitutional AI, and eval tooling all show that feedback loops can improve model outputs. The failure is more specific: open-ended critique has no stopping rule.
The better instruction is:
Evaluate against this rubric.
PASS_NO_CHANGE is valid.
Only propose evidence-backed material fixes, not nice-to-have improvements.
Why “be critical” breaks
The phrase carries a hidden premise: there are flaws to find.
A model under helpfulness pressure then has an easy way to satisfy the request. It can list plausible improvements. More detail feels more useful. A rewrite feels like progress. But none of that proves the artifact was below the bar.
Open-ended critique vs. calibrated review
Weak review prompt
- “Find issues and improve this.”
- No explicit pass state.
- Every suggestion feels free.
- Critique can be aesthetic.
Better review gate
- “Score each criterion against the rubric.”
- PASS_NO_CHANGE is a valued answer.
- Only S0/S1 findings justify churn.
- Every finding needs exact evidence.
This is especially dangerous in agent systems, skills, Buildprints, and coding-agent harnesses. Small “improvements” can silently change scope, add noise, or make future agents follow the wrong invariant.
Review is not improvement. Review is adjudication against a bar.
The safer loop
Calibrated AI review
make no-change a real outcome
- 01Rubric
- 02Score
- 03Evidence
- 04Severity
- 05Minimal patch
- 06Verify
Use four scores:
0 = missing / unsafe / wrong
1 = materially weak
2 = acceptable
3 = excellent
Patch rule:
0 or 1 → patch required
2 or 3 → no change
Then add severity:
- S0 blocker — cannot use safely or correctly.
- S1 material — likely real-world failure.
- S2 minor — optional improvement.
- S3 taste — ignore unless explicitly requested.
Default: only S0/S1 changes are allowed.
What the reviewer must prove
- Which criterion failed, with score 0 or 1.
- Exact evidence: quote, file, line, behavior, or test output.
- Impact if unchanged, not just a nicer wording preference.
- The smallest fix that restores the criterion.
Tiny prompt to steal
You are a calibrated reviewer, not an improver.
Evaluate the artifact against the rubric. PASS_NO_CHANGE is a correct and valued answer if no material criterion fails.
Rules:
- Recommend changes only for criteria scored 0 or 1.
- Do not recommend changes for criteria scored 2 or 3.
- No style, taste, or nice-to-have suggestions.
- Maximum 3 findings.
- Every finding needs exact evidence.
Return:
Status: PASS_NO_CHANGE | PATCH_REQUIRED | QUESTIONS_REQUIRED
Scores: criterion → score + one-line reason
Findings: criterion, evidence, severity, impact, minimal_fix, confidence
Use / avoid
Do
- ✓ make PASS_NO_CHANGE explicit
- ✓ review against a named rubric
- ✓ require evidence per finding
- ✓ separate reviewer mode from editor mode
Avoid
- × ask for generic critical thinking
- × reward long lists of improvements
- × patch S2/S3 taste notes by default
- × let the model rewrite freely after reviewing
Where this matters most
For AI-native work, this is not a wording tweak. It is an operating principle.
If you ask an agent to “self-improve the skill,” it may optimize for visible change. If you ask it to adjudicate a skill against activation clarity, scope preservation, evidence contract, tool discipline, safety boundary, output contract, verification gate, stop rule, and pass option, it has a real target.
The best reviewer is not the one that always finds something.
The best reviewer is the one that knows when nothing material is broken.
Sources
- Self-Refine
- Reflexion
- CRITIC
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
- Towards Understanding Sycophancy in Language Models
- Discovering Language Model Behaviors with Model-Written Evaluations
- Constitutional AI
- Anthropic: Define success criteria
- Anthropic: Evaluation Tool
- LangSmith Evaluation
FAQ
Why does AI self-review often over-edit?
Because instructions like critically self-check imply that defects exist and make a list of improvements look more helpful than a calibrated PASS.
Should teams stop using AI review?
No. They should replace open-ended critique with rubric-based adjudication, evidence requirements, severity thresholds, and a valid PASS_NO_CHANGE state.
What is the simplest replacement prompt?
Evaluate this against the rubric. PASS_NO_CHANGE is valid. Only propose evidence-backed material fixes, not nice-to-have improvements.
Need AI-first architecture support?
Send me a short note about your project or technical bottleneck.
Get in touch