Why does AI self-review often over-edit?

Because instructions like critically self-check imply that defects exist and make a list of improvements look more helpful than a calibrated PASS.

Should teams stop using AI review?

No. They should replace open-ended critique with rubric-based adjudication, evidence requirements, severity thresholds, and a valid PASS_NO_CHANGE state.

Stop Asking AI to Critically Self-Check

Q: What is the simplest replacement prompt?

Evaluate this against the rubric. PASS_NO_CHANGE is valid. Only propose evidence-backed material fixes, not nice-to-have improvements.

Why AI reviewers always find something, and the safer replacement: rubric-based review where PASS_NO_CHANGE is a valid outcome.

The short version

“Critically self-check this” sounds responsible.

In practice, it often creates a tiny reward hack: the AI acts like a useful reviewer by finding something to change.

That does not mean self-review is useless. Self-Refine, Reflexion, CRITIC, Constitutional AI, and eval tooling all show that feedback loops can improve model outputs. The failure is more specific: open-ended critique has no stopping rule.

The better instruction is:

Evaluate against this rubric.
PASS_NO_CHANGE is valid.
Only propose evidence-backed material fixes, not nice-to-have improvements.

valid PASS state

style fixes without evidence

maximum material findings

Why “be critical” breaks

The phrase carries a hidden premise: there are flaws to find.

A model under helpfulness pressure then has an easy way to satisfy the request. It can list plausible improvements. More detail feels more useful. A rewrite feels like progress. But none of that proves the artifact was below the bar.

Open-ended critique vs. calibrated review

Weak review prompt

“Find issues and improve this.”
No explicit pass state.
Every suggestion feels free.
Critique can be aesthetic.

Better review gate

“Score each criterion against the rubric.”
PASS_NO_CHANGE is a valued answer.
Only S0/S1 findings justify churn.
Every finding needs exact evidence.

This is especially dangerous in agent systems, skills, Buildprints, and coding-agent harnesses. Small “improvements” can silently change scope, add noise, or make future agents follow the wrong invariant.

Review is not improvement. Review is adjudication against a bar.

The safer loop

Calibrated AI review

make no-change a real outcome

01

Rubric
02

Score
03

Evidence
04

Severity
05

Minimal patch
06

Verify

Use four scores:

0 = missing / unsafe / wrong
1 = materially weak
2 = acceptable
3 = excellent

Patch rule:

0 or 1 → patch required
2 or 3 → no change

Then add severity:

S0 blocker — cannot use safely or correctly.
S1 material — likely real-world failure.
S2 minor — optional improvement.
S3 taste — ignore unless explicitly requested.

Default: only S0/S1 changes are allowed.

What the reviewer must prove

Which criterion failed, with score 0 or 1.
Exact evidence: quote, file, line, behavior, or test output.
Impact if unchanged, not just a nicer wording preference.
The smallest fix that restores the criterion.

Tiny prompt to steal

You are a calibrated reviewer, not an improver.

Evaluate the artifact against the rubric. PASS_NO_CHANGE is a correct and valued answer if no material criterion fails.

Rules:
- Recommend changes only for criteria scored 0 or 1.
- Do not recommend changes for criteria scored 2 or 3.
- No style, taste, or nice-to-have suggestions.
- Maximum 3 findings.
- Every finding needs exact evidence.

Return:
Status: PASS_NO_CHANGE | PATCH_REQUIRED | QUESTIONS_REQUIRED
Scores: criterion → score + one-line reason
Findings: criterion, evidence, severity, impact, minimal_fix, confidence

Use / avoid

Do

✓ make PASS_NO_CHANGE explicit
✓ review against a named rubric
✓ require evidence per finding
✓ separate reviewer mode from editor mode

Avoid

× ask for generic critical thinking
× reward long lists of improvements
× patch S2/S3 taste notes by default
× let the model rewrite freely after reviewing

Where this matters most

For AI-native work, this is not a wording tweak. It is an operating principle.

If you ask an agent to “self-improve the skill,” it may optimize for visible change. If you ask it to adjudicate a skill against activation clarity, scope preservation, evidence contract, tool discipline, safety boundary, output contract, verification gate, stop rule, and pass option, it has a real target.

The best reviewer is not the one that always finds something.

The best reviewer is the one that knows when nothing material is broken.