AI Infrastructure

The Agentic Wallet: Designing Key Management for Non-Human Signers

by Shikhar Singh•13 min read

The Agentic Wallet: Designing Key Management for Non-Human Signers

The most common approach I see teams take when giving an AI agent blockchain access is: generate a private key, fund it with some ETH and USDC, inject it as an environment variable, done.

This works until it doesn't. And when it doesn't, it tends to be expensive.

The private-key-as-env-var pattern gives an agent the same key management story as a 2013 Bitcoin user: one key, unlimited signing authority, no revocation, no audit trail, recovery via "don't lose the key." For a human who signs two or three transactions a day, the risk is manageable. For an agent that might sign thousands of transactions, talk to dozens of external services, and run 24/7 in an environment you partially control, the risk profile is completely different.

I've been thinking hard about this problem while building UACP and the agentic wallet tooling around it. Here's the architecture I think is right, and why the current standard is insufficient.

Why Human Wallet Design Fails for Agents

Human wallets are designed around a specific threat model: a human controlling a key, with another human or state actor trying to steal it. The defense mechanisms reflect this:

Seed phrases: Human-memorizable recovery
Hardware wallets: Air-gapped signing for high-value decisions
Social recovery: Trusted humans can help you regain access
Manual confirmation: You see every transaction before signing

Every one of these is designed for human interaction speed and human cognitive capacity. They break for agents in different ways:

Seed phrases don't make sense for agents because agents don't have memories between sessions the way humans do. You'd store the seed phrase as a secret, which means it's just a private key with extra steps.

Hardware wallets require physical interaction. An agent running in a cloud environment can't tap a hardware wallet.

Social recovery assumes trusted humans are available to help. For an agent running at 3am executing an automated DeFi rebalance, there's no human available.

Manual confirmation is the one that really breaks the agent use case. The entire point of an agentic system is that it doesn't need a human to confirm each action. Manual confirmation reintroduces the human approval loop you built the agent to eliminate.

The result: most agent systems run with a hot wallet, unlimited signing authority, and prayer. This is not good enough.

What Agents Actually Need: The Five Primitives

After working through the UACP payment flows and building wallet tooling for agent systems, here are the five things that agent key management actually needs:

1. Capability-Scoped Keys

An agent key should only be able to sign specific types of transactions, not arbitrary ones. A pricing agent that calls quote APIs should never be able to sign a token transfer. A swap agent should be able to sign swaps up to a certain size, but not governance votes.

The right model is capability-scoped signing keys — keys where the signing authority is constrained by the key's declaration, enforced either at the smart contract level (ERC-4337 account abstraction) or at the agent framework level before any signing happens.

const agentKey = createScopedKey({
  capabilities: ['swap.execute', 'quote.read'],
  maxTransactionValue: parseUnits('100', 6), // $100 USDC max per tx
  allowedContracts: ['0xUniswapRouter', '0xAAVE'],
  expiresAt: Date.now() + 24 * 60 * 60 * 1000 // 24 hours
})


## What Agents Actually Need: The Five Primitives

After working through the UACP payment flows and building wallet tooling for agent systems, here are the five things that agent key management actually needs:

### 1. Capability-Scoped Keys

An agent key should only be able to sign specific types of transactions, not arbitrary ones. A pricing agent that calls quote APIs should never be able to sign a token transfer. A swap agent should be able to sign swaps up to a certain size, but not governance votes.

The right model is **capability-scoped signing keys** — keys where the signing authority is constrained by the key's declaration, enforced either at the smart contract level (ERC-4337 account abstraction) or at the agent framework level before any signing happens.

```typescript
const agentKey = createScopedKey({
  capabilities: ['swap.execute', 'quote.read'],
  maxTransactionValue: parseUnits('100', 6), // $100 USDC max per tx
  allowedContracts: ['0xUniswapRouter', '0xAAVE'],
  expiresAt: Date.now() + 24 * 60 * 60 * 1000 // 24 hours
})

This key literally cannot sign a governance vote. Not because of policy enforcement. Because the key doesn't have the capability. The constraint is cryptographic, not procedural.

2. Ephemeral Key Hierarchies for Task Isolation

Long-running agent keys are a security liability. If a key that's been active for six months gets compromised, six months of transactions are attributable to an attacker.

The right model: derive short-lived task keys from a master key using hierarchical deterministic derivation (same principle as HD wallets, but with task semantics):

MasterKey (never leaves cold storage)
  └── Session key (24h, derived from master + date)
        └── Task key (1h, derived from session + task_id)
              └── Sub-task key (10min, derived from task + subtask_id)

Each task gets its own key. Compromise of a task key affects only that task. The master key never signs transactions directly — it only derives the session key, which derives task keys.

When a task completes, the task key is deleted. Not rotated. Deleted. This limits the window of exposure to the task duration.

3. Spend Limits as a Signing Primitive

I covered spending policies in the UACP article, but it's worth stating here too: spend limits should be enforced before signing, not after.

Most smart contract-based spend limits (like the allowance system in ERC-20) enforce limits at execution time, on-chain. This means a compromised agent can attempt an over-limit transaction — it'll fail, but the attempt is on-chain, attributable, and costs gas.

A better architecture enforces spend limits in the signing layer:

const signer = createBudgetedSigner({
  key: taskKey,
  dailyLimit: parseUnits('500', 6),
  perTransactionLimit: parseUnits('50', 6),
  onLimitApproaching: (threshold) => alertCoordinator(threshold),
  onLimitExceeded: () => { throw new SpendLimitError() }
})

The signer refuses to sign. No transaction is ever submitted. No gas is wasted. The coordinator gets notified. This is a defense-in-depth layer that smart contract limits don't provide.

4. Automatic Key Rotation Without Seed Phrases

Human key recovery uses seed phrases because humans need a way to restore access after losing a device. Agents don't need to "remember" — they need to regenerate.

The right model for agent key rotation:

The master key lives in a hardware security module (HSM) or encrypted secret store
Derived keys are regenerated from the master on demand, using deterministic derivation
Rotation means deriving the next key in the sequence and updating any registrations
No seed phrase, no recovery ceremony, no human in the loop

If a session key is compromised, rotation is: derive session_key(date+1), update the DID document and the UACP agent card, invalidate outstanding proofs signed by the old key. Automated, sub-second, no human required.

5. Structured Audit Logs, Not Transaction History

Blockchain transaction history tells you what was signed. For agent debugging and compliance, you also need why — what triggered the transaction, what policy allowed it, what the agent's internal state was when it decided to sign.

Transaction history is public and immutable but lacks context. A structured off-chain audit log paired with the on-chain transaction provides the full picture:

{
  "timestamp": "2025-09-10T14:23:01Z",
  "task_id": "rebalance-0x1234",
  "agent": "did:ethr:0xSwapAgent",
  "action": "swap.execute",
  "trigger": "portfolio_drift > 2%",
  "policy_check": { "capability": "swap.execute", "passed": true, "limit_used": "45/100 USDC" },
  "tx_hash": "0xabc...",
  "result": "success"
}

This log entry, paired with the on-chain transaction, gives you everything you need for compliance review or post-incident debugging. Without it, you're reconstructing intent from transaction data — possible, but slow and error-prone.

The ERC-4337 Bridge

Everything I've described above can be implemented at the application layer without account abstraction. But ERC-4337 account abstraction makes several of these primitives easier and pushes enforcement on-chain.

With a 4337 smart account for your agent:

Spending limits can be enforced in the account's validation logic
Capability scoping can be encoded in the account's validateUserOp function
Key rotation can be done by updating the account's authorized signers without changing the account address
Paymaster integration means the agent doesn't need ETH for gas separately from its USDC budget

The tradeoff: 4337 adds complexity and gas overhead (~20-30k extra gas per transaction for the validation layer). For high-frequency, small-value agent transactions, this overhead can dominate. For lower-frequency, higher-value transactions, it's the right security tradeoff.

My current recommendation: use application-layer controls for high-frequency low-value agent operations, use 4337 accounts for agents that hold significant value or sign high-stakes transactions.

The Open Problem: Revocation at Agent Speed

The hardest unsolved problem in agentic key management is revocation latency.

If an agent key is compromised, you want to revoke it immediately. In human wallet security, "immediately" means within minutes or hours — long enough to notice, respond, and broadcast a revocation. In an agentic system running at high frequency, a compromised key can do damage in seconds.

On-chain revocation lists propagate in 12-15 seconds on most EVM chains. In that window, a compromised agent can still make signed requests that will be accepted by any service checking against a slightly stale revocation list.

The mitigation I use is short-lived capability tokens: credentials derived from the agent's DID that expire in 60 seconds. Any service verifying the agent's identity sees a 60-second TTL on the token. Compromise of the signing key means the attacker has at most 60 seconds of valid tokens in flight. After that, new tokens require the revoked key to sign — which will fail revocation checks.

This is not a complete solution. It's a risk reduction. The complete solution probably involves threshold signing (M-of-N agents must co-sign high-value transactions, making single-key compromise insufficient) and real-time anomaly detection that can trigger an emergency pause faster than a human can react.

Neither of these is fully built out in any open-source agentic framework I'm aware of. This is the work that needs to happen before agentic systems are managing truly significant financial value.

Where This Is Going

We're in the phase of agentic development where the frameworks are ahead of the security infrastructure. LangChain, AutoGPT, CrewAI — all of these let you give an agent a wallet and have it execute transactions. None of them have production-grade key management built in.

That gap will close. Either the frameworks will build it in, or there'll be enough production incidents that teams start demanding it. I hope it's the former.

The primitives aren't research problems. Capability-scoped keys, ephemeral hierarchies, spend limits in the signing layer, deterministic rotation — all of this is buildable today with existing cryptographic primitives. It's engineering work, not research work.

The agents are already here. The key management should be too.

Agentic wallet tooling built alongside UACP: github.com/0xshikhar/UACP

***