The thing nobody tells you about deploying AI agents in production is that the model itself is the easy part. Modern frontier models are smart, safety-trained, and improving fast. The hard part — the part where careers are made and reputations are lost — is the engineering around the agent. The wiring. The trust boundaries. The "who can do what, when, with whose approval" question that determines whether your agent is a productivity multiplier or a liability that loiters around your production database with a butter knife.
This post is the playbook we use when wiring agents into systems that matter. Our clients have run agents against payroll, against customer records, against contract repositories, against trading dashboards. Here's how we kept those agents useful without giving them the keys to everything.
The Two Failure Modes
Every agent failure in production traces back to one of two patterns:
- The agent did exactly what it was asked to do, and that was wrong. Hallucinated context, prompt injection in retrieved data, the user asking for something subtly destructive. The model wasn't broken — the surrounding system gave the agent permission to act on bad inputs.
- The agent did something nobody asked it to do. Goal drift, capability hallucination, cascading retries that compounded into something nobody intended. The model went off the rails.
The engineering response to both is the same: narrow what the agent can do, log everything it does, and put humans in the loop for actions that matter. None of this is novel. All of it is necessary.
Permission Model: Read-Only First, Always
The first deployment of any agent against a new system should be read-only. Always. We do not negotiate this with clients.
Read-only deployments let you observe the agent's behaviour against your real data without risking anything. You discover the cases where it queries pointlessly. You discover the cases where it misreads schema. You discover the prompt injection paths in your customer-submitted content. You learn the surface area of the problem before you give the agent any ability to make it worse.
After a period of read-only operation (typically 2-4 weeks for a non-trivial system), you graduate the agent to write capabilities one tool at a time. Not "now it can write to the CRM" — "now it can create draft notes, which are flagged as agent-authored and require human approval before publishing."
Scoped Credentials Per Tool
The default temptation is to give the agent's tools a single API key with broad access — "the agent needs to read and write the CRM, so here's a CRM admin token." Resist this. Every tool should have its own credential, scoped to exactly what that tool needs.
Concretely, for an agent that talks to Salesforce:
salesforce_searchuses a connected app with read-only access to Contacts and Accounts. Nothing else.salesforce_create_noteuses a separate connected app with create-only access to a Notes object, no update or delete.salesforce_update_status— if it exists at all — uses a third credential restricted to one specific field on one specific object type.
This is fiddly. It's also what stops a compromised tool from being a compromised system. The blast radius of any single tool is bounded by the credentials that tool uses, not by the agent's overall trust level.
Dry-Run Mode for Every Write Tool
Every tool that mutates state should support a dry-run flag. When set, the tool performs all validation, returns the exact payload it would send, and skips the actual side effect. The agent can use this to "preview" an operation; humans can use it to test the tool's behaviour without consequences.
// A tool with built-in dry-run support
const SendInvoiceInput = z.object({
customerId: z.string(),
amountCents: z.number().int().positive(),
dueDate: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
dryRun: z.boolean().default(false)
.describe("If true, validate inputs and return the would-be payload without sending"),
});
async function sendInvoiceHandler(rawArgs: unknown) {
const args = SendInvoiceInput.parse(rawArgs);
const customer = await stripe.customers.retrieve(args.customerId);
const payload = {
customer: customer.id,
collection_method: "send_invoice",
days_until_due: daysUntil(args.dueDate),
line_items: [{ amount: args.amountCents, currency: "usd",
description: `Invoice for ${customer.name}` }],
};
if (args.dryRun) {
return {
content: [{ type: "text",
text: `DRY RUN — would create invoice for ${customer.name} ` +
`(${customer.email}) for $${(args.amountCents / 100).toFixed(2)}.\n\n` +
`Payload:\n${JSON.stringify(payload, null, 2)}` }],
};
}
const invoice = await stripe.invoices.create(payload);
await stripe.invoices.sendInvoice(invoice.id);
return {
content: [{ type: "text",
text: `Created and sent invoice ${invoice.id} to ${customer.email}.` }],
};
}
Approval Queues for Destructive Operations
For anything irreversible at scale — customer communications, financial transactions, deletes, configuration changes — the agent doesn't execute the operation. It proposes the operation. A human approves it.
The pattern we use: every Tier 2 tool actually does two things — it stages a pending action in an approval queue, then waits for a webhook from the approval UI. The agent's tool call returns once approval is granted or denied. From the model's perspective, it's a single tool call that takes a while; from the system's perspective, there's a human-in-the-loop checkpoint in the middle.
type Approval = {
id: string;
tool: string;
args: Record;
rationale: string;
status: "pending" | "approved" | "rejected" | "expired";
approver?: string;
decidedAt?: string;
};
async function approvalGatedTool(opts: {
tool: string;
args: Record;
rationale: string;
execute: () => Promise;
timeoutMs?: number;
}): Promise<{ content: { type: "text"; text: string }[]; isError?: boolean }> {
const approval = await approvals.create({
tool: opts.tool,
args: opts.args,
rationale: opts.rationale,
status: "pending",
});
// Posts to a Slack channel with Approve / Reject buttons that hit a webhook
await slack.postApprovalRequest(approval);
// Wait up to N minutes (default 10) for a human decision
const decision = await approvals.waitForDecision(approval.id, opts.timeoutMs ?? 10 * 60_000);
if (decision.status === "approved") {
const result = await opts.execute();
await auditLog.write({
type: "approved_tool_execution",
approvalId: approval.id,
approver: decision.approver,
tool: opts.tool,
args: opts.args,
result,
});
return { content: [{ type: "text", text: `Approved by ${decision.approver}. Executed.` }] };
}
if (decision.status === "rejected") {
return {
content: [{ type: "text",
text: `Rejected by ${decision.approver}. Reason: ${decision.reason ?? "(none given)"}` }],
isError: true,
};
}
return {
content: [{ type: "text", text: "Approval request expired without a decision." }],
isError: true,
};
}
The agent now sees a structured rejection and can reason about whether to ask for clarification, try a different approach, or surface the rejection to the user. This is far better than either "approve everything automatically" or "human approves nothing because the queue is too long."
Immutable Audit Logs
Every tool call, every approval decision, every model invocation — everything is logged to an append-only store. Postgres with a trigger that prevents updates and deletes on the audit table works fine; for higher-stakes systems we use S3 with object lock or a managed audit log service (AWS CloudTrail, Datadog Audit Trail).
The audit record for every tool execution captures:
- Timestamp (server-side, never client-supplied).
- Agent session ID and user ID on whose behalf the agent was acting.
- Model name and version used for the decision.
- Tool name, full input arguments, and full result payload.
- Approval ID and approver, if applicable.
- The conversation context (or a content-addressable reference to it) that produced the tool call.
This audit log is the answer to two questions you will eventually have to answer: "what did the agent do for this customer last Tuesday?" and "how did the agent end up doing this thing?" Without it, you're guessing. With it, you have ground truth.
Rate Limits and Circuit Breakers
Agents are remarkably capable of generating expensive runaway behaviour. A reasoning loop that calls search_customers a thousand times in a minute will not only blow your downstream system's rate limits, it'll burn through your model API budget at a pace that makes finance unhappy.
Two defences:
- Per-agent rate limits on every tool, enforced at the MCP server / tool runtime layer. A typical limit: 30 calls per tool per agent session, with a hard cap on total tool calls per session (say 100).
- Circuit breakers on downstream systems. If a tool starts returning errors at a rate above 20% over a 30-second window, the circuit opens and subsequent calls fail fast with a clear error. The agent learns to back off.
Both are simple to implement (Redis with sliding-window counters) and both will save you the day an agent decides to enumerate every record in a database to "make sure it found the right one."
Why MCP Makes This Easier
Every pattern in this post is implementable without MCP. The reason we still recommend MCP for production agents is that it gives you a natural enforcement point. The MCP server sits between the agent and your real systems. Permissions, rate limits, audit logging, dry-run mode, approval gating — all of it lives in the MCP server, not scattered across model prompt logic.
This is the same architectural argument as putting all your auth at the API gateway rather than in every service. Centralise the policy, and the agent can be as creative as it wants — every action still has to pass through the same gates.
Key Takeaways
- Start every agent deployment read-only. Graduate to write capabilities one tool at a time, with explicit approval.
- Per-tool credentials, scoped to exactly what that tool needs. Never share a "god token" across tools.
- Every mutating tool gets a dry-run flag. Use it for testing, let the agent use it to "preview" actions.
- Tier 2 actions (anything irreversible at scale) go through an approval queue. The agent proposes; a human decides.
- Immutable audit log of every tool call, approval, and model decision. Ground truth for the questions you'll have to answer.
- Per-agent rate limits and downstream circuit breakers stop runaway loops from becoming runaway incidents.
- MCP gives you one place to enforce all of this. Use it as the policy boundary.