esteban@devtrillo:~/blog/writing$
← cd -
$ cat your-agents-cant-see.md

Your agents can't see

Jun 29, 2026·6 min read
$ tail -f your-agents-cant-see/readers
connecting…

Your agent just told you the build passes. It never ran the build.

Mine did exactly that last week. It had renamed a type, made the same change across four files, and written a confident little summary: “All updated. The build should pass now.” It couldn’t run the build to check. So it guessed, and the guess was wrong. A type error sat three files away, invisible, waiting for me to find it the slow way.

That was my fault, not the agent’s.

Because here is the thing I keep relearning:

Your agents can’t see.

What you do without noticing

When you write code, you are swimming in feedback you don’t even register.

The red squiggle under a misspelled variable. The dev server that reloads and shows you the broken layout. The test that goes green. The gut feeling that this file is the wrong place for this function. You get a hundred tiny signals a minute, and most of them never reach your conscious mind. You just course-correct.

An agent gets none of that for free.

It can’t glance at the running page. It can’t feel that a name is off. It can’t see the red squiggle, because there is no editor and no eyes. The only feedback an agent ever receives is text that some command printed to a terminal. If a signal isn’t in that text, for the agent it does not exist.

So the agent does the only thing it can. It reasons about what probably happened and moves on. Sometimes the reasoning is right. Sometimes you get “the build should pass now.”

Agent Experience is just feedback design

We already have words for this. We design user experience so a person can reach a goal without fighting the tool. We design developer experience so an engineer can move fast without tripping over the setup.

Agent Experience is the same idea, pointed at a coworker who can’t see.

And it comes down to one loop:

Make a change → run something → read the verdict → correct.

That loop is exactly how a good human developer works, too. The difference is that for a human the “run something” and “read the verdict” steps are often invisible and instant: your eyes do them. For an agent, every step of that loop has to be a command it can run and output it can parse. Your whole job, designing for AX, is to make that loop fast, local, and unambiguous.

It splits cleanly in two: helping the agent start right, and helping it catch itself.

Half one: help it start right

The cheapest mistake to fix is the one the agent never makes. Most bad agent output isn’t a reasoning failure. It’s the agent guessing at something you simply never told it. So the first half of AX is cutting down what it has to guess.

In this repo that’s a few plain text files doing quiet work:

None of this is clever. It’s just refusing to keep the map in my own head.

NOTE

The test for a good navigation doc is simple: if a new agent (or a new teammate) would have to ask you a question to get started, the answer to that question belongs in a file, not in your memory.

Every one of those files shrinks the agent’s search space. It starts closer to right, which means it has less to course-correct later. And course-correcting is the part it’s worst at, because it can’t see.

Half two: help it catch itself

Prevention only goes so far. The agent will still write something broken. The question is whether it finds out, or whether you do.

If the only way to know the code is broken is to look at it, you’ve made yourself the agent’s eyes. You become the feedback loop, and you run at human speed while the agent waits.

The fix is to give the agent its own eyes: one command that returns an honest verdict.

When I started writing this post, my repo didn’t really have one. Tests existed. Type errors only showed up at build time, slowly, tangled in a pile of unrelated output. There was no single thing I could tell an agent to run to know whether it had broken something. So I added it:

package.json
{
"scripts": {
"check": "astro check", // types + Astro diagnostics
"test": "bun test worker/", // the unit tests
"verify": "bun run check && bun run test", // the one command that matters
},
}

Now there is exactly one instruction: bun run verify. Types, then tests. It exits 0 or it doesn’t. Nothing to interpret. That’s the whole point, because interpreting is where a blind agent invents “the build should pass now.”

TIP

An agent-facing command should pass or fail loudly and say almost nothing in between. Verbose, ambiguous output is noise the agent has to guess about, and guessing is the failure mode you’re trying to kill.

Keep the hot loop fast

Notice what verify leaves out: the full build and the linter.

That’s deliberate. The full build re-indexes the search engine and takes real time; the formatter already runs on every commit. If I put all of it in verify, the agent’s inner loop would get slow. A slow loop is one the agent (and I) start skipping.

So the loop is tiered. verify is the fast subset the agent runs constantly while it works. The slow, thorough stuff gates later, at the commit boundary, where waiting a few seconds is fine.

One verdict, two trigger points

That commit boundary is the second place the same verdict shows up.

My pre-commit hook used to just format the staged files. Now it also runs the gate:

.husky/pre-commit
bunx lint-staged
bun run verify

Same command. Two moments. The agent runs verify while it works, to catch itself early. The hook runs it again at the commit, as a backstop, so that even if the agent forgot (or I did), nothing broken gets committed.

I like that it’s the same command in both places. There’s no separate “CI version” of correctness to drift out of sync with the local one. The verdict an agent gets at its keyboard is the exact verdict that guards the door.

What changes when the loop is tight

Once the agent can see, the way you work with it changes.

You stop being the type checker. You stop being the thing that notices the build is red. The agent makes a change, runs verify, reads the failure, and fixes it. Three times over, in the time it would’ve taken you to read the first diff. You come back to code that already passed its own exam.

Which frees you up for the part the agent still can’t do: deciding whether the change should have existed at all. You review outcomes, not keystrokes.

That’s the real payoff of taking AX seriously. The agent doesn’t get smarter. It just stops working blind. And a coworker who can check their own work is a fundamentally different coworker than one who hands you a confident “should pass now” and waits.

Give your agents eyes. They’ll surprise you with how little supervision they need once they have them.

NOTE

This is the inner loop: one agent at the keyboard, catching itself. There’s an outer loop too. Agents running in CI, reviewing pull requests and improving the codebase on a schedule, with no human at the keyboard at all. That’s the next post.