Blog

AI Agent Security: Lock Down Your Agent Before It Bites

AI agent security done right: least-privilege design, draft-only actions, isolated keys, and row-level data rules. The exact patterns operators use to stay safe.

June 5, 2026OpenClawCrew7 min read

AI Agent Security: Lock Down Your Agent Before It Bites

An agent with your credentials and no guardrails is not an assistant. It is a liability with API access.

Most security incidents with AI agents are not exotic. They are the boring stuff: a key pasted into a chat window that ends up in a public repo, an agent given send-email rights that fires off the wrong message, one compromised role that can read every other role's secrets. AI agent security is mostly about closing those holes before they open. This guide walks through the specific patterns operators run in production: least privilege, isolated keys, draft-only actions, data-layer rules, and a cheap two-pass audit you can run today.

Treat the agent like a new hire, not a co-founder

The cleanest mental model comes from Nate Herk, who runs a multi-agent fleet: "Pretend this is an actual intern or a new employee. What access would you give them? You wouldn't just give them your credit card."

That single question kills most bad designs. You would not hand a new marketing hire your bank API keys. So your marketing agent should not have them either. Every agent gets only the credentials and tools its role requires, and nothing else. A research agent needs read access to sources. It does not need your Stripe key. A content agent needs your blog connector. It does not need finance.

Least privilege is not paranoia. It is blast-radius control. When something goes wrong, and something eventually will, the damage stops at the edge of that one role's access.

Keep secrets out of the chat window

This is the mistake almost everyone makes once. You paste an API key into the chat to get the agent working, it works, you move on. The problem is that the chat is logged. If you sync your agent folder to GitHub for backup, that key just went to a repo.

Set secrets through config, not conversation:

hermes config set GITHUB_TOKEN <your-token>

That writes the value to the agent's environment file, not the session history. On a containerized setup it lands inside the container's .env, which is exactly where you want it. To rotate or remove a key, edit that file directly rather than typing the old key back into chat.

The rule is simple: a secret should never appear in any text the agent might store, summarize, or push. If you have already pasted one, treat it as compromised and rotate it.

Isolate agents so one breach is not all breaches

If you run more than one agent, give each its own container, its own environment file, its own keys, its own memory, and its own crons. Think of it as separate offices in a building rather than one open room where everyone can read everyone's desk.

This buys you three things. A compromised or misbehaving agent cannot reach another agent's secrets. You can give each role exactly the access it needs without affecting the others. And debugging gets easier because state does not bleed across roles.

The decision gate for spinning up a separate isolated agent is short. Does the role need different secrets, separate long-term memory, or its own audience? If any answer is yes, isolate it. Otherwise add skills to your existing agent. This isolation discipline pairs naturally with running several agents at once, which we cover in multi-agent orchestration.

Name your keys so you can see the damage

Per-agent isolation has a quiet security bonus: visibility. Give each agent its own named key with your provider, something like hermes-marketing-key and hermes-finance-key. Now your provider dashboard shows you exactly which agent is spending what.

This is your early warning system. A runaway loop, a prompt injection that turns your agent into someone else's worker, or a simple bug burning tokens all show up as an anomaly on one named key instead of hiding inside a single blended bill. You catch it in hours instead of at the end of the month. Tracking spend this way also feeds directly into cost control, which we get into in AI agent optimization.

Draft-only is a security control, not a convenience

The single most effective safety pattern for agent actions is to remove the irreversible buttons entirely.

The standard setup for email is the clearest example. The agent can find, draft, label, and archive. It cannot send. Jack Roberts, who runs email triage daily, is blunt about why: "Never give the agent the ability to send emails, only to draft. I'm still at the point where I would never let them run riot."

Apply the same thinking everywhere there is a one-way door. When you connect a tool through a connector, enable only the safe verbs. For a calendar, allow create and find, exclude delete-all. For anything that touches money, customers, or production, require a human approval gate. A wrong draft costs you nothing. A wrong send costs you a relationship. Keep the human where the stakes are. This draft-only discipline is also the backbone of safe personal ops automation.

Secure the data layer with row-level rules

When your agent reads and writes to a shared database, plain table access is not enough. You want rules at the row level so each role can only touch its own data.

In a multi-agent content pipeline, for example, the research agent can write only to the topics table and the script agent only to scripts. Enforced at the data layer, this prevents one agent from corrupting another's work and makes a multi-agent system far easier to reason about when something breaks.

If you are building a real app on top of this, turn on row-level security from the start and make sure new rows inherit the policy automatically. It is a one-time setup that saves you from the classic mistake of shipping a database where any authenticated user can read everyone's records. There is more on this in building AI agent apps.

Run the two-pass audit before anyone else touches it

Here is a defense-in-depth trick that costs almost nothing. Before you ship, audit the system from a fresh context.

Open a brand-new session with none of the build history loaded, so the agent is not biased by knowing how it is supposed to work. Tell it to behave like an external user with none of your access, hunting for every vulnerability it can find. Then take that full list of findings and paste it into a second clean session for an independent re-check.

Jack Roberts summed up why this matters: "Everyone's busy shipping, nobody's busy doing quality control. Run the security audit from a fresh context as a user with none of your access, then re-check it in a second window." Two cheap passes catch the obvious holes before a client or an attacker does.

A quick security checklist

Give each agent only the tools and keys its role needs.
Set secrets through config, never in chat. Rotate anything you have pasted.
Isolate agents with separate environments, keys, and memory.
Use named, per-agent keys so spend anomalies are visible.
Make high-stakes actions draft-only or gate them behind human approval.
Enforce row-level rules on shared data.
Run a two-pass, fresh-context audit before shipping.

FAQ

What is the biggest AI agent security risk in practice?

Over-permissioning. Most real incidents come from an agent having access it never needed: a key it can leak, a send button it can misuse, or read access to another role's data. Least privilege removes most of the risk surface.

How should I store API keys for an AI agent?

In the agent's environment configuration, set through a config command rather than typed into the chat. Chat is logged and may be synced or summarized, so a pasted key can leak. Use named keys per agent so you can track spend and spot anomalies.

Is it safe to let an AI agent send emails or make payments?

Treat those as one-way doors. Keep them draft-only or behind a human approval gate. The agent can prepare everything, but a person approves the irreversible step. The cost of a wrong autonomous action is far higher than the minor friction of approving.

How do I audit an agent I already built?

Open a fresh session with no build context, tell it to act as an external attacker with none of your access, and have it list every vulnerability. Paste those findings into a second clean session to re-check. Two passes, almost free, and they catch the obvious problems.

Lock it down without the guesswork

Security is the part people skip until it bites them. You do not have to.

OpenClawCrew's private AI agent starter kits ($49) ship with least-privilege defaults, config-based secrets, and draft-only actions already wired, so you start safe instead of retrofitting it later. For teams that want the full isolated, audited, role-separated setup stood up properly, the done-for-you setup service handles the whole thing. Your agent runs on Hermes (or OpenClaw), holds your rules, and works through your tools without holding a loaded gun. Once it is locked down, point it at real work, starting with personal ops or shipping apps.

Back to Blog