Your Agents Have Root and You Gave It to Them

Tool calls run at machine speed. Your approvals run at human speed. That gap becomes your next incident.

Dec 24, 2025

We did not have a breach. We had a tool call.

The agent did exactly what we permitted, in the wrong place, with the wrong data, at machine speed.

If that sentence makes your stomach drop, good. It should. Because the failure mode is not exotic. It is delegation without a control plane.

The recognition trigger

Platform teams spent the last decade turning infrastructure into APIs. Now we are turning authority into APIs. Agents are not just generating text. They are invoking tools. That means querying data stores, opening tickets, rotating keys, changing configs, deploying changes, pulling logs, and kicking off workflows.

This is not copilot. This is an execution layer. And if you are adopting MCP or any similar tool connector pattern, you are standardizing how that execution layer reaches into your systems. That is powerful. It is also how you accidentally hand out root.

What root means here

Root is any delegated capability that can change reality or leak it at scale.

Definition

read sensitive data at scale
change production behavior
create or approve new access paths
exfiltrate through allowed channels
write durable state you will later trust

If an agent can do those without bounded scope, explicit identity, and traceable intent, you gave it root. Even if nobody typed admin.

Why this is happening now

Humans are good at controlling tools when the interaction is slow. You log in, click, confirm, wait, feel friction, and sometimes reconsider.

Agents remove friction. They chain actions, retry automatically, explore until success, and optimize for completion, not consequences. They do it at the speed of software.

That gap is the vulnerability. Most orgs are addressing this backwards. They start with what agent should we buy instead of what permissions model are we willing to live with.

The three failure modes you will see first

First: Permission sprawl

A tool connector is created temporarily. Then it becomes permanent. Then it gets reused. Then nobody knows what it can do anymore. Then it gets copied to a second agent because it already works. Now you have two unknown roots.

The agent does not need to be malicious. It just needs access.

Second: Decision fog

When something goes wrong, nobody can answer:

Why did it do that
What did it see
What did it try before it succeeded
Who approved that capability
What policy allowed that call

You cannot govern what you cannot reconstruct.

Third: Audit theater

You will have logs. They will be incomplete, uncorrelated, or unactionable. They will tell you that a thing happened, not whether it was allowed, bounded, and attributable.

If you cannot answer who did what, under what authority, using what data, with what approvals, your audit trail is vibes.

All three converge in the same place: incidents you cannot reconstruct and access you cannot explain.

This is what shipping agents without a control plane buys you. Minimum viable version below.

Sprawl breeds fog, fog breeds theater, theater protects sprawl.

The uncomfortable truth

Agents turn your platform into the attack surface. Not because agents are evil. Because your platform is where tools, data, and permissions meet, and agents live at that junction.

So the platform team owns agent blast radius now. If you ship agents without a control plane, that is not an accident. It is a choice. You do not need a new religion. You need five boring controls that actually work.

Minimum viable Agent Control Plane

Five boring controls that keep agents from quietly having root.

First: Agent identity

Summary: Each agent has a unique identity bound to purpose, environment, and owner, expiring by default.

Prevents: Shared service accounts that erase accountability.

If your agents run as service-account-prod, you are already lost.

Second: Allowlisted tool permissions

Summary: Explicit allowlists per tool call, deny by default. Define by tool, verb, and resource.

Prevents: Permission sprawl disguised as convenience.

If the policy cannot be read in 30 seconds, it will not be enforced in reality.

Third: Data scope

Summary: Tool permission is incomplete until data boundaries are enforced across retrieval, caching, logging, and summarization.

Prevents: Silent leakage through legitimate channels.

Scope covers which datasets can be retrieved, which fields are masked or excluded, and what gets cached, logged, or summarized. If your agent can read it, assume it can leak it.

Fourth: Provenance

Summary: Incident-grade provenance is a correlated, tamper-resistant event record that ties instruction to tool call to policy decision reference to outcome.

Prevents: Decision fog during incidents and audits.

At minimum, capture the instruction, tool call and parameters, policy decision reference, result classification, and downstream actions.

Fifth: Revocation and kill switch

Summary: Instant revoke of identity and permissions, plus quarantine and cache invalidation.

Prevents: Slow-motion incidents you cannot stop.

If you cannot revoke an agent in minutes, you will discover that during the incident.

What to do this week (Friday test)

This is Card 00. Paste it into a ticket. Run this as written.

Readiness

If you cannot name scope, owner, and environment in under one minute, stop.
If you cannot revoke in minutes, you are not ready to run the workflow.

Scope

One real agent workflow (not a demo)
One environment
One owner

Step 1: Allowlist

5 tool calls maximum
each with verb + resource scope
deny all else

OTW
Owner:
Timebox:
Witness:

Step 2: Provenance requirement

every tool call logged with: who, what, where, policy decision reference, result classification

OTW
Owner:
Timebox:
Witness:

Step 3: Abuse scenarios

prompt injection attempt to reach a forbidden dataset
tool chaining attempt to exfiltrate through an allowed channel
escalation attempt to request a new permission

OTW
Owner:
Timebox:
Witness:

Step 4: Outcomes that matter

time to detect
time to attribute
time to revoke (in minutes)

OTW
Owner:
Timebox:
Witness:

Do not argue. Measure.

If any of those is slow, you did not ship capability. You shipped future work in the form of an incident.

What this changes in leadership posture

Stop asking what can the agent do. Start asking what authority are we delegating, and how do we revoke it.

The platform team becomes the steward of delegated action. That is a governance role, whether anyone likes that label or not.

Ends, Means, Price

Ends: safe delegated execution.

Means: Agent identity, Allowlisted tool permissions, Data scope, Provenance, Revocation and kill switch.

Price: a slightly slower rollout and much cheaper incidents.

Close

You can absolutely build agentic systems safely. But only if you admit what you are building: an execution layer that touches tools, data, and permissions. If you delegate action, you are delegating stewardship. If you give that execution layer broad access without Agent identity, Allowlisted tool permissions, Data scope, Provenance, and Revocation and kill switch, then yes:

Your agents have root and you gave it to them.

Per ignem, veritas.

If this helped, forward it to the person who owns the on-call rotation. Originally published at: https://forgedculture.com

Discussion about this post

Ready for more?