← Back to registry
Quality Advanced by Agent Buildprint

Complete Agent Skills Evaluation OS

Prove whether an agent setup is installed, discoverable, useful, safe, reproducible, and worth its context cost.

Open manifest GitHub folder

methodology

agent-ready
01 Contract spine
02 Implementation phases
03 Validation gates
04 Prompt handoff

Executable packet spine

The files or runtime artifacts a fresh agent must read or produce before claiming progress.

  • ACTIVATION_EVALS.md
  • BUILDPRINT.md
  • CONTRACTS.md
  • E2E_TASK_BENCH.md
  • LOADOUT_INVENTORY.md
  • MULTI_AGENT_SAFETY.md
  • SAFETY_POLICY.md
  • README.md
  • SPEC.md
  • PLAN.md
  • TEST_MATRIX.md
  • VALIDATION_TEMPLATE.md
  • checks/acceptance.md

Proof gates

Checks that prevent a vague implementation from being reported as complete.

  • Offline fixtures only by default
  • Scorecard produces evidence
  • Process invariants are asserted
  • Live adapters require separate acceptance

Risks covered

Failure modes the Buildprint makes visible before an agent can hide them in “done”.

  • Live-provider false positives
  • Undiscoverable skills
  • Unsafe multi-agent execution

What this ships

01

Setup snapshot

02

Static lint

03

Skill unit evals

04

Transcript invariants

05

Scorecard

Validation evidence

For newer phase-flow Buildprints, this names replay or outcome evidence. For older registry entries, it lists the included validation checks.

  • Offline fixtures only by default
  • Scorecard produces evidence
  • Process invariants are asserted
  • Live adapters require separate acceptance

Copyable agent prompt

Manifest
Use the Agent Buildprint: Complete Agent Skills Evaluation OS.

Bootstrap it with `agb start https://agent-buildprint.com/buildprints/complete-agent-skills-evaluation-os/package.json ./my-build` or inspect the GitHub folder. Follow BUILDPRINT.md as the authority spine, read the supporting contracts/checks before implementation, preserve the stated non-goals and safety gates, and do not claim completion until the included validation checks pass.