What Agents Need Before They Handle Real Money

February 6, 2026

There are 1.5 million agents transacting on Moltbook right now. Depending on who you ask, this is either the early singularity, a dumpster fire, or 17,000 humans puppeting bots. Wiz Research found 341 malicious skills on ClawHub stealing credentials. Karpathy called it what it is.

Here’s the thing: it doesn’t matter which take is right. The infrastructure problems are the same regardless. Agents that move money need guardrails that actually work — not guardrails that politely suggest limits and hope the application layer behaves.

We’ve been building agentic treasury and banking infrastructure at Catena — with the same focus on compliance and institutional trust that I learned building USDC at Circle — for a while now. Quietly, mostly. But the OpenClaw moment made it clear that these problems are suddenly relevant to a much larger audience than we expected, much sooner than we expected. So we recorded a quick raw demo.

The demo

The video is six unpolished minutes, no production, just direct content. I’d recommend watching it before reading the rest of this, but here’s the short version.

Quick Cowork and OpenClaw demo of what agent treasury management and payments look like with identity and policy guardrails, wallet infra by @turnkeyhq pic.twitter.com/1RxefnSPj2
— Sean Neville (@psneville) February 6, 2026

We show an AI agent with its own USD account, powered by Turnkey’s wallet infrastructure, holding stablecoin on Base. Users see human-readable labels — “Visa •••4532”, “Chase Checking •••7891”, “Catena Treasury Agent” — not raw blockchain addresses. Then we run it through a series of transactions that exercise different policy paths:

Auto-approved rebalance. The agent sends $10 to a treasury agent. Policy says under $50/day to known treasury addresses is fine. It executes instantly. No human in the loop.

Approval required. The agent tries to pay $8 on a Visa bill. Policy requires human approval for external payments. An approval card appears. The user approves, and it goes through.

Rejection. The agent tries to withdraw $25 to a Chase checking account. Same approval flow. The user rejects. Funds don’t move. And — this is the important part — even if the AI application layer were completely compromised, Turnkey’s policy engine still wouldn’t allow a signature without that approval.

Hiring another agent. The user asks to hire a Meridian research agent. The system checks the agent’s decentralized identity, pulls its reputation score (87/100, 142 attestations), and auto-approves because the score meets the configured threshold. Agent-to-agent commerce with identity verification built in.

Standing authorization. The user delegates ongoing spending authority — $20/month on Meridian reports. Two-layer enforcement applies to recurring authorizations too. Then we check how the treasury agent is actually allocating capital across a portfolio view with APY breakdowns.

The whole thing runs in two surfaces: Claude Desktop using MCP with rich interactive cards, and OpenClaw over WhatsApp as a text-based skill. Same agent, same policy enforcement, different interfaces. The surface changes. The security model doesn’t.

The core idea: two layers, not one

Most agent frameworks treat policy as an application-layer concern. The AI decides whether a transaction should happen, checks some rules in code, hopefully works within prescribed guardrails, and proceeds. This is fine for demos. It is not fine for real money.

The problem is straightforward: application-layer policy is only as secure as the application. If someone compromises the server, jailbreaks the model, or finds a bug in your policy-checking code or guardrails framework, the money moves. You’ve built a lock out of suggestions.

What agents actually need are two layers:

Layer 1 is intelligence. This is the application layer — the part that answers the questions you’d want answered before any money moves. Who controls this agent? Are they a verified entity? What’s their track record? You can see this in the demo: before the treasury agent pays another agent for research services, it resolves their identity, checks their reputation score, and evaluates whether they meet the policy threshold. An agent with a verified owner, a score of 87/100, and 142 attestations clears. An unverified agent with a dispute flag doesn’t. This is the kind of automated standards-based trust infrastructure that the agentic economy needs — not platform-specific API keys, but portable, verifiable identity that works across any agent framework.

Layer 2 is enforcement. In this example, this part runs in Turnkey’s secure enclave. Has the required approval been obtained? The enclave signs the transaction only if every policy condition is met. This isn’t running in our application code. It’s running in hardware that neither we nor Turnkey can tamper with after deployment.

Intelligence without enforcement is just prompt suggestions. Enforcement without intelligence is just a dumb access list. You need both.

Even if our entire backend is compromised, the enclave won’t sign transactions that violate policy. That’s not a promise — it’s math.

How the custody actually works

This is worth being specific about, because “non-custodial” gets thrown around loosely.

The signers are the customer (via passkey on their device), the customer’s agent, and the Catena treasury agent. Turnkey enforces a quorum — 2-of-2 or 2-of-3 depending on the configuration — so no single party can move funds alone.

Different combinations handle different scenarios. For routine operations within policy — say, a small rebalance into an approved vault, with spend limits and recipients guided by human-defined policy — the agents can co-sign automatically. No human approval needed. The default policy requires the customer’s passkey for sensitive operations like withdrawals — but the customer defines those policies, not us. And in a 2-of-3 setup, the customer and their agent can reach quorum without Catena entirely. There’s always an exit path.

The policies governing all of this live both in ACK-ID verifications (including integration with ERC-8004) as well as inside Turnkey’s secure enclave. The policies define which identities are approved, what amount thresholds or reputational thresholds trigger additional approval, and which operations the agents can handle on their own. Beneath the agent application layer, the enclave evaluates every transaction against its part of these rules before signing. Agent code can’t override it.

The architecture reflects problems we’ve been working on together: how do you give agents real financial capability without creating a new category of catastrophic failure mode?

Why now

Six months ago, agent-to-agent payments were a theoretical concern. Now there are seven-figure sums moving through agent networks with, in many cases, no policy enforcement at all. The gap between what agents can do and what the infrastructure supports them doing safely is widening fast.

We’re showing what we’re building because the people building agents right now — in the OpenClaw ecosystem and beyond — are going to run into these problems immediately if they haven’t already. We love the vision and are optimistic, but the 341 malicious ClawHub skills are not an anomaly. They’re the beginning. We have to get this right, because getting it wrong has dire consequences.

This is a glimpse of where we’re headed, not a finished product. We’re building financial infrastructure that agents and their owners can trust — and we’d rather get it right with the community than in isolation. If any of this resonates, or if you think we’re wrong about something, let us know.