Agents you only half-trust — First Penguin

Every agentic-product demo ends the same way. Someone asks “but do you trust it?”, the founder says yes, and the room nods as if that settled something. It didn’t, because trust isn’t a yes/no question. It’s a dial, and most teams I meet have theirs set to either zero or eleven — full lockdown or full access, nothing in between.

I run agents in production, and the dial position that actually works is somewhere uncomfortable: enough access to be useful, little enough to be survivable. This dive is about how we set that dial where I work, in language you can use without an engineering degree.

The hotel keycard

Your hotel keycard opens your room, the gym, and the lift to your floor. It does not open other guests’ rooms, and it stops working on Thursday when you check out. Nobody calls this “distrusting the guest.” It’s just how buildings work when strangers sleep in them.

Token scopes are the same idea for software. An agent gets a key that opens exactly the rooms its job requires, for exactly as long as the job runs. On the platform I work on — the architecture is our lead platform architect’s work; the scoping model is the part I conceived and specified — every agent-facing token has to answer three questions before it exists: what can it read, what can it change, and how fast can we take it away.

scope_design read:events ✓ · write:notes ✓ · admin:* ✗

blast_radius one workspace · one session · revocable in <60s

incident_log 9 months in production · 0 destructive actions · 2 near-misses, both stopped by scope

// pattern-level numbers from a system I help run; details anonymized

The near-misses are the point. Both times, an agent tried to do something outside its lane — once from a badly written prompt, once from a model update that changed behavior overnight. Neither became an incident, and not because the agent was smart. Because the keycard didn’t open that door.

The question is never “do you trust the agent?” It’s “what’s the biggest mess it can make in the minute before anyone notices?”

Five questions to ask any vendor

You don’t need to understand OAuth to buy agentic software well. You need these, asked in order, and you should expect plain answers:

What’s the smallest set of permissions this agent can run with — and is that the default, or do I have to ask?
Can I read a log of everything it did last week, in under five minutes, without a support ticket?
What happens when it tries something it isn’t allowed to do? (The honest answer names a mechanism, not a hope.)
Do its keys expire on their own, or does someone have to remember?
Who can revoke access, and how long does revocation actually take?

A vendor who answers all five without flinching has done the work. A vendor who answers “our model is very safe” has answered a different question than the one you asked.

Where this is heading

Agent-to-software connections are standardizing fast, which means your tools will soon have more semi-trusted callers than human users. That’s not a reason for alarm; it’s a reason to make half-trust the default posture now, while the stakes are still workshop-sized. The teams that learn to hand out narrow keys this year will be the ones comfortable handing out many keys next year.