Skip to content

Blog

huecki

Practical writing about agents, automation, and software architecture.

Detected Focus

Agent Harnesses

Several recent posts point to the same engineering problem: agents do not only need better prompts. They need a runtime contract around the prompt.

The minimal model

A useful harness defines the agent’s phase, allowed actions, required evidence, exit condition, and blocker rules. That is the difference between an AI that tries to help and an agent that can be operated.

Read the full 2-minute summary

An agent harness is the control layer around an AI agent. It defines allowed actions, state transitions, required evidence, and stop conditions.

Without a harness, the model decides too much from chat context. It may edit before inspecting, mark work done without running checks, keep looping after the useful work is finished, or create fake review comments because the prompt asked it to criticize.

The practical fix is to move those rules out of vibes and into an explicit contract: phase state controls which actions are allowed; evidence gates control when the agent may claim progress; exit conditions control when the loop stops; blocker rules control when the agent must ask for help; review rules prevent pointless self-critique.

May 28, 2026 · AI-first Engineering

Your AI-Built UI Needs a Playtester, Not a Screenshot Review

AI-generated interfaces often look finished before they behave correctly. A GUI playtester loop uses a separate browser agent to interact with the artifact, record screenshots and action logs, turn broken flows into reproducible bug reports, and rerun the same script after repairs.

AI EngineeringAI AgentsWebwrightPlaywrightUI TestingCoding AgentsDeveloper WorkflowEvals
Read article

May 28, 2026 · AI-first Engineering

Stop Judging AI Code by the Diff

Better AI coding is not mainly about better prompts. It is about the harness around the model: explicit contracts, separate builder and reviewer roles, evidence requirements, and a loop that turns failures into better specifications.

AI EngineeringAI AgentsCoding AgentsDeveloper WorkflowAgent HarnessEvals
Read article

May 26, 2026 · AI Agent Workflows

Agents Don’t Need ‘Keep Going’. They Need Exit Conditions.

The useful lesson behind Claude Code /goal is not that agents can run forever. It is that long-running agent work needs an explicit, observable exit condition: what proves done, what stays in scope, and when to stop blocked.

AI AgentsAgent HarnessDeveloper WorkflowAI EngineeringEvals
Read article

May 26, 2026 · AI Agent Workflows

Don’t Benchmark the Model. Benchmark the Agent System.

Agent evals should not only ask whether the final answer looked good. A useful benchmark measures the whole agent system: skill routing, tool policy, evidence, outcomes, hard-fail safety cases, regressions, cost, and production drift.

AI AgentsAgent HarnessEvalsDeveloper WorkflowAI Engineering
Read article

May 25, 2026 · AI Agent Workflows

Give Your Agent Seatbelts, Not a Longer Prompt

When an agent keeps jumping from planning to editing to testing at the wrong time, the fix is not usually another paragraph of system prompt. Put the workflow into explicit states, give each state a tiny tool policy, and make phase changes visible.

AI AgentsCoding AgentsAgent HarnessDeveloper WorkflowAI Safety
Read article

May 24, 2026 · AI-first Engineering

Agent harnesses should be specs, not hidden glue code

Natural-Language Agent Harnesses give a useful name to an important shift: the agent policy should be an inspectable document that a runtime executes, not invisible glue hidden inside controller code.

AI EngineeringAI AgentsAgent HarnessEvalsDeveloper Workflow
Read article

May 23, 2026 · AI-first Engineering

Spec-Driven Context Resets for Coding Agents

Long agent chats rot. A better pattern is to move decisions into small spec files, clear context between layers, and let each coding-agent session read only the artifact it needs.

AI EngineeringCoding AgentsContext EngineeringSpec-Driven DevelopmentDeveloper Workflow
Read article

May 21, 2026 · AI Agent Workflows

AI Agents Need Evidence Before They Click

When an agent clicks, sends, pays, deletes, or extracts data, the critical truth cannot live only in model prose. Put a small evidence gate before risky tool calls: predicate, evidence type, source, decision.

AI AgentsMultimodal AIBrowser AgentsAI SafetyDeveloper Workflow
Read article

May 21, 2026 · AI Agent Workflows

Stop Asking AI to Critically Self-Check

Open-ended instructions like “critically self-check this” accidentally reward the model for producing criticism. The fix is not less review. It is calibrated review: explicit criteria, PASS_NO_CHANGE, evidence per finding, severity thresholds, and a tiny change budget.

AI AgentsPrompt EngineeringEvalsAI SafetyDeveloper Workflow
Read article

May 20, 2026 · AI-first Engineering

Agents Don’t Need Longer Prompts. They Need Harnesses.

The arXiv survey Code as Agent Harness names the next shift in agent engineering: code is not only what agents generate. It is becoming the executable, inspectable, stateful runtime that makes agents reliable.

AI EngineeringAI AgentsAgent HarnessCoding AgentsEvalsDeveloper Workflow
Read article

May 20, 2026 · AI-first Engineering

Your Onboarding Is Why Your Team Is Vibe Coding

Teams do not usually start vibe coding because developers became careless. They start because onboarding is broken: docs are stale, harnesses are undocumented, system knowledge lives in people’s heads, and AI turns missing context into plausible code and Markdown.

AI EngineeringDeveloper OnboardingVibe CodingDeveloper ExperienceInternal ToolsAgent Harness
Read article

May 19, 2026 · AI Agent Workflows

give every client project a tiny agent

The useful move is not one mega assistant for all client work. Give each client project a small, isolated agent with its own memory, tasks, preview URL habit, and boring daily standup.

AI AgentsFreelancingClient WorkAutomationWorkflow Design
Read article

May 18, 2026 · AI-first Engineering

Prompt Decomposition: How to Break Down AI Tasks Properly

After context engineering comes decomposition: developers should stop putting everything into one prompt and instead split tasks into direct prompts, subtasks, pipelines, agent loops, or skills.

Prompt EngineeringDecompositionAI AgentsSkillsDeveloper Workflow
Read article

May 15, 2026 · AI-first Engineering

The LLM-native developer needs more than prompts

The next developer skill is not writing clever prompts. It is building the operating system around LLMs: data quality, model versioning, evals, guardrails, incident response, review UX, and repo instructions agents can actually follow.

AI EngineeringLLMSoftware ArchitectureAgentsDeveloper Workflow
Read article

May 15, 2026 · Personal AI Workflows

Voice notes are the best interface for small agent jobs

Voice is not good for everything. But for small agent jobs it is brutally useful: dictate a task while moving, transcribe it locally, let your existing agent handle it, and get only a short answer back.

AI AgentsVoice InterfaceAutomationOpen SourcePersonal AI
Read article

May 12, 2026 · AI-first Engineering

Prompting Is Dead. Context Wins.

In 2026, good prompting is not about one magic sentence. The better approach is to curate context, define tools and schemas, set agent rules, and verify behavior with evals.

Prompt EngineeringContext EngineeringAI AgentsAI Workflows
Read article

May 11, 2026 · AI Agent Workflows

Hermes Agent: Self-Review Instead of One-Shot Output

Hermes gets interesting when an agent does not only produce output, but reviews the run: execute, measure, critique, rewrite the skill, and test again. The loop pays off mainly for repeatable workflows.

Hermes AgentAI AgentsSelf-ImprovementNous Research
Read article

Need AI-first architecture support?

Send me a short note about your project or technical bottleneck.

Get in touch