← Blog
·4 min read

I trust AI to write my code, not my email

An engineer’s agent once sent a real reply by accident. The lesson isn’t “don’t use agents” — it’s that trust should scale with blast radius, and your task list should enforce it.

Here’s a story that should be required reading before you point an AI agent at anything that matters. The engineer Harper Reed wired Claude up to his Gmail, calendar, and contacts so it could triage his inbox and draft replies in his voice. It worked well — until one day it didn’t just draft a reply, it sent one: an enthusiastic “I would love to do this” about writing a book, to a real person, without him in the loop.

As he put it afterward: “I trust these agents to write code way more than I trust them to write an email… Giving your agents access to things that affect other people is scary.” The new ironclad rule in his setup: always draft, never send.

Trust doesn’t scale with capability

The instinct is to calibrate how much you trust an agent by how smart it is. That’s the wrong axis. A model good enough to write a decent email is also good enough to send a bad one. What actually matters is the blast radius of the action: how hard is it to undo, and who else gets hit if it’s wrong?

Writing code in a branch has a small blast radius — it sits there until a human reviews and merges it. Sending an email has a large one: it reaches another person instantly and can’t be recalled. Same model, wildly different risk. The decision to let an agent act unsupervised should track that difference, not the model’s benchmark scores.

The pattern: propose, then dispose

Reed’s “always draft, never send” is one instance of a general rule that shows up everywhere people run agents seriously: the agent produces a result, but a human commits it. Drafts, not sends. Pull requests, not pushes to main. “Ready for review,” not “done.”

This is the single most important design choice in a tool that lets agents do real work. It’s also the one most easily skipped, because in the demo everything works and the gate feels like friction. It isn’t friction — it’s the thing that lets you leave the agent running at all.

Why the task list is the right place to enforce it

You can bolt a review gate onto each integration by hand, the way Reed did with his email server. But if your tasks already live in one place, the list itself is the natural enforcement point. When you assign a task to an agent in Lume, the agent can move it to “ready for review” and no further. Completion is a human action, by design. There’s no path where an agent quietly marks its own work done and moves on.

And because every change is undoable, even the rare bad call is cheap: it’s a thing sitting in your review queue that you reject in two seconds, not a sent email you can’t take back.

Match the leash to the risk

None of this means keeping agents on a tight leash for everything — that would defeat the point. It means being deliberate about which tasks get a long leash and which get a short one. Refactor a function? Long leash, review the PR. Email a client, touch billing, delete data? Short leash, or not at all.

The skill is reading a task and knowing its blast radius before you hand it off. Reed learned it the hard way, in public, so you don’t have to. Build the gate in first — then let the agents run.

Want a list your agents can pull from?

Lume gives every task an API, an MCP server, and an assignee. Free to start.