Proving Ground by Taiko

Why cheaper ZK proofs make verifiable AI Agents economically viable

Proving Ground by Taiko — Thu, 18 Jun 2026 12:57:12 GMT

ZK proof costs just dropped by up to half, and that is more interesting for AI Agents than it sounds. A verifiable AI Agent is only as trustworthy as the record it leaves behind, and the strongest record is a ZK proof of what the Agent actually did. The catch was always cost, because proving every action continuously used to cost more than the actions were worth, so verifiability got rationed to the few moves big enough to justify it. When the per-proof price falls far enough, that arithmetic flips, and proving every AI Agent action becomes economically rational rather than a luxury reserved for high-value transactions. The recent Boundless Surge upgrade is the concrete trigger here, cutting proving costs by up to 50% and adding 25% more proving capacity, but the point worth exploring is what a cost drop like that unlocks for the agent category as a whole.

Why proof cost was the ceiling on verifiable AI Agents

The agent category has been stuck on a simple arithmetic problem. An AI Agent that acts autonomously is only as trustworthy as the record it leaves behind, and a ZK proof is the strongest record there is because it shows the Agent did exactly what it claimed without anyone having to take its word for it. The trouble was that proving every action continuously cost more than most of those actions were worth, so verifiability became a luxury reserved for the moves big enough to justify it. That is why so much agent tooling built over the past year leaned on selective verification or trusted execution rather than proving things outright, because the per-proof economics did not pencil out when an Agent’s work required continuous on-chain attestation. Cost was the ceiling, and it sat low enough that most teams either skipped the proof step entirely or settled for verifying only their highest-value actions.

What changes when proving every action is cheap

Lower the per-proof price by half and the default flips. Instead of asking which actions are valuable enough to prove, a team can prove everything an Agent does and treat the full execution trace as verifiable by default. That is a different product, because an Agent whose every step carries a proof can be audited, trusted and composed with by other Agents without anyone extending good faith. The proof step stops being the line item that decides whether the product ships, and verifiability moves from a feature you ration to a property the Agent simply has. For a category whose entire promise rests on Agents acting on your behalf without supervision, making continuous proof affordable is the difference between agents you have to watch and agents you can actually let run.

Which products come back into range

When proving every action becomes cheap enough to be the default, a set of products that were previously uneconomic come back into range. Verifiable AI Agents that need to prove their work on chain without proof costs eating the margin are the clearest example, since the thing that made them expensive was never the model or the execution, it was the cost of attesting to what the Agent did often enough to be trustworthy. High-frequency proof use cases sit right alongside them, because DeFi strategies, oracle attestations and cross-chain messaging all generate far more provable events than any team could previously afford to prove. Cheaper proofs do not just make existing products marginally better, they change which products can be built at all, and the ones that benefit most are exactly the high-frequency, continuously attesting workloads that AI Agents tend to produce.

Why this is the moment worth watching

A year ago the per-proof cost was the bottleneck for the whole agent category, because the economics broke down the moment an Agent’s work required continuous on-chain attestation. Cutting that cost in half does not solve every problem in the space, but it moves the floor far enough that proving every action becomes a reasonable default rather than an expensive exception, and that single change quietly expands what a verifiable AI Agent is allowed to be. The interesting question is no longer whether agents can prove what they do, it is what teams build once proving everything is finally cheap enough to assume.

Cheaper ZK proofs matter for AI Agents because they turn continuous on-chain verifiability from a cost you ration into a default you can assume, which is the precondition for AI Agents that act autonomously and still leave a record anyone can trust.

This post is exploratory and does not represent a specific roadmap.

Fable 5 didn't close the AI Agent trust gap

Proving Ground by Taiko — Thu, 11 Jun 2026 11:57:42 GMT

Anthropic released Claude Fable 5 on June 9 2026, the first time a Mythos-class model has been placed in public hands, and the company framed it plainly as the engine for the next era of autonomous AI Agents. The headline capability is not a benchmark, even though Fable 5 took the top of Code Arena by 98 points. It is duration. Fable 5 runs autonomously for longer than any Claude model before it, to the point where Stripe reported it migrating a 50-million-line codebase in a single day, work that would have taken a team of engineers more than two months by hand. When the unit of work shifts from answering a prompt to finishing a job, the question stops being how smart the model is and becomes what an AI Agent is allowed to touch while no one is watching.

That is the gap Fable 5 widens rather than closes. A more capable model makes an AI Agent better at deciding what to do, but it does nothing to guarantee that the data the Agent acted on was real, that the action it took can be verified after the fact, or that the value it moved settled the way it was supposed to. Capability and trust are different problems, and the industry has spent the last two years pouring almost everything into the first one.

Why a smarter model does not close the agent trust gap

The clearest signal that capability has outrun safe deployment came from Anthropic itself. Fable 5 shipped with safeguards that silently reroute queries on sensitive topics like cybersecurity and biology to the older Claude Opus 4.8, so the most capable public model now runs with a quiet governor bolted on. A company puts a throttle on a system when the system can do more than it can be trusted to do unsupervised, and that same logic scales the moment you hand the model a wallet, an API key and a goal.

Pricing makes the timing sharper. Anthropic put Fable 5 at ten dollars per million input tokens and fifty per million output tokens, less than half the price of its previous Mythos preview, and made it free on paid plans until June 22. That is a distribution move, not a margin one, and it means a wave of AI Agents built on a Fable 5-class model is about to start running longer and acting more independently than anything that came before. The model deciding what to do gets cheaper and better every release. The part that confirms the Agent acted on real inputs and produced a result anyone can check does not come from the model at all.

What autonomous AI Agents actually need to act unsupervised

An AI Agent running a job end to end needs three things the model cannot provide on its own. It needs verified data, so that the price it trades against or the message it reacts to is real rather than spoofed or stale. It needs verifiable action, so that the decision it made and the step it executed leave a record others can check instead of a black box anyone has to take on faith. And it needs reliable settlement, so that when the Agent moves value the transfer is final and correct rather than reversible or forged. Strip any one of these away and a more capable model just reaches the wrong outcome faster.

The failure modes are not hypothetical, because we already watch them play out in systems with far less autonomy than Fable 5 invites. An Agent that pays out against a manipulated price feed loses the money exactly as designed. An Agent that acts on a cross-chain message no one can verify executes fraud on schedule. The more independently these systems run, and the longer they run before a human looks, the more the absence of a trust layer compounds rather than corrects.

How verification and settlement become the real bottleneck

For two years the constraint on AI Agents was the model, so that is where the money and attention went. Fable 5 is the clearest sign that the constraint is moving. Once a model can run a multi-step job unsupervised for hours, the thing standing between a useful Agent and a dangerous one is no longer intelligence but the fabric that verifies inputs, attests to actions and settles outcomes. Whoever provides that fabric stops being optional infrastructure and becomes the layer the whole Agent economy depends on.

This is the question Proving Ground keeps returning to, because it is the question Taiko is built around. A trust and settlement layer for autonomous AI Agents is the part of the stack that turns a capable model into an Agent you can actually let run, and Fable 5 just made the case for building it more urgent than any benchmark could.

This post is exploratory and does not represent a specific roadmap.

How AI Agent memory actually works

Proving Ground by Taiko — Tue, 09 Jun 2026 12:49:39 GMT

AI Agent memory works in two layers. There is short-term memory, which is whatever fits inside the model’s context window during a single run, and there is long-term memory, which is everything the Agent writes to an external store so it survives the session ending. The short-term layer is fast and disappears the moment the context resets. The long-term layer is where the durable stuff lives, usually a vector database for semantic recall and sometimes a graph database for relationships between facts, and it is the layer most teams underbuild. By 2026 the dominant production pattern is hybrid, a short-term episodic buffer sitting on top of vector plus graph retrieval, because no single structure covers everything an AI Agent needs to remember. What almost nobody specifies is where those long-term bytes physically sit and who pays to keep them there, which turns out to be the part that breaks when an Agent has to run on its own.

What short-term and long-term memory actually mean for an AI Agent

Short-term memory is the context window. It holds the current conversation, the last few tool outputs, the active goal and the intermediate reasoning the model is using to decide what to do next. It is quick and it is local to the run, which is exactly why it is fragile. When the window fills up or the session ends, that working state is gone unless the Agent deliberately moved it somewhere first. People treat the context window as memory because it behaves like memory inside one session, but it is closer to RAM than to a hard drive. Nothing in it persists by default.

Long-term memory is what you get when the Agent writes selected pieces of that working state out to an external store so a future run can read them back. The research literature, borrowing from cognitive science, splits long-term memory into three rough types. Episodic memory records specific past events, what happened, what the Agent did and how it turned out. Semantic memory holds general facts the Agent has learned and can reuse, like a user’s preferences or a stable piece of domain knowledge. Procedural memory captures how to do a recurring task, the learned routine rather than the one-off event. An AI Agent that remembers you across weeks is reading episodic and semantic memory back into its context window at the start of each run, then writing new entries at the end.

How retrieval works once the memory lives outside the model

The reason long-term memory needs more than a text file is retrieval. An Agent cannot reload everything it has ever stored into a context window, so it has to fetch only the relevant pieces, and “relevant” is a fuzzy match rather than an exact key. That is what a vector database does. Each stored memory gets converted into an embedding, a numeric representation of its meaning, and at recall time the Agent embeds its current situation and pulls the stored items that sit closest in that space. This is why an Agent can surface a note you wrote three weeks ago that never shared a single keyword with what you just asked. Graph databases handle the other half of the problem, the relationships between facts, so an Agent can traverse from a person to the project they own to the deadline attached to it rather than guessing those links from similarity alone. Most serious deployments in 2026 run both, a vector store for semantic recall and a graph for structure, with the short-term buffer in front of them. Frameworks like mem0, LangGraph, Letta and Redis exist mostly to wire that pipeline together, the extraction step that decides what is worth keeping, the consolidation step that refines it and the retrieval step that reads it back.

Where the memory actually lives, and why that breaks for autonomous Agents

Here is the gap every memory diagram skips over. A vector database stores embeddings, but embeddings still point at bytes, the original document, the image, the working note, the per-user state the Agent is keeping. Those bytes have to sit somewhere durable, and that somewhere has historically been a cloud bucket behind an account, an API key and a credit card. That assumption is fine when a human set the Agent up and pre-funded everything. It falls apart the moment you want an AI Agent to manage its own memory, because the Agent cannot open an account, cannot rotate a key and cannot sit through a billing signup. The memory layer everyone designs around quietly depends on a human standing behind it holding the storage relationship together.

This is the problem Tack was built for. It gives an AI Agent a memory layer it can stand up and pay for by itself, no human holding the storage relationship together. Tack is the agent-native storage product from Inference Room, built on Taiko, and it stores the bytes behind an Agent’s memory without any of the account scaffolding. The Agent writes a private object to Tack, pays a fraction of a cent in USDC inline with the request through x402, and gets back an id that only the paying wallet can read. A 5MB memory object held for a month settles at $0.0010, with no signup, no API key and no human in the loop. The object is wallet-scoped rather than sitting at a public address, so per-user session memory, drafts and working notes stay private to the Agent that owns them, and it expires after the paid duration so memory is something the Agent rents for as long as it needs it. The wallet that pays for the Agent’s compute is the same wallet that owns its memory, which is the shape autonomous memory actually needs.

Tack is the agent-native storage layer from Inference Room, built on Taiko and settling in USDC via x402 with no accounts, and it gives AI Agent memory a durable place to live that the Agent can pay for and own by itself. It is open at tack.inferenceroom.ai now.

This post is exploratory and does not represent a specific roadmap.

Multisig owners are signing transactions they cannot read

Proving Ground by Taiko — Thu, 04 Jun 2026 12:38:59 GMT

On February 21, 2025, Bybit lost roughly 1.5 billion dollars from an Ethereum cold wallet, the largest crypto theft on record. The multisig worked exactly as designed. The threshold was met, the required owners signed and the transaction executed. What failed was legibility. The signers saw a benign transfer in their interface while the calldata underneath moved 401,000 ETH to addresses controlled by North Korea’s Lazarus Group. They approved what they could see, and what they could see had been quietly swapped for something else. This is blind signing, the practice of authorizing a transaction whose true effect you have not independently verified, and the gap between what a multisig owner reads and what they actually authorize is the single largest unsolved problem in on-chain treasury operations. It gets worse the moment AI Agents start touching the same Safes.

A multisig is supposed to remove single points of failure by requiring several humans to agree before money moves. The security model assumes each of those humans can evaluate what they are agreeing to. Strip that assumption out and a five-of-nine Safe is not five independent checks, it is one unreadable transaction signed five times.

Why multisig review collapses into trust

Reviewing a Safe transaction properly is real work. You decode the raw calldata into a human-readable action, simulate the resulting state against the current chain to confirm the action does what it claims, check the counterparty address against the team’s history of known entities and then match the whole thing to whatever invoice, spec or governance proposal authorized it in the first place. Done carefully that is fifteen to twenty minutes per transaction, and a busy treasury signs several a week across payroll, vendor payments and contract upgrades.

Nobody has that time on every transaction, so review quietly degrades into pattern-matching. A familiar Slack message, a signer you trust, a transaction that looks like last month’s, and the signature goes through in seconds. Bybit’s fatal signature cleared in well under a minute. The interface said one thing, the bytes said another and the social proof of nine colleagues all clicking approve did the rest. Trust filled the space where review was supposed to be, and trust is precisely the thing an attacker manufactures.

What blind signing actually costs

Blind signing happens because the tooling shows owners a hex string and a destination and asks them to trust that the two match the intent in their heads. It is not an edge case reserved for billion-dollar exchanges, it is the default operating mode of almost every multisig, because decoding calldata by hand on every transaction is impractical for any team moving at the speed of a real treasury.

The cost is not only catastrophic theft. It is also the slow tax of routine error, a transfer with one zero too many, a payment to a stale address, a contract upgrade pointed at the wrong implementation, a duplicate nonce that quietly competes with a legitimate transaction already in the queue. Each of these is an unreadable transaction that a human waved through because reading it correctly would have meant twenty minutes they did not have. The exploits make headlines. The fat-finger losses and misrouted payments never do, and in aggregate they cost teams more than the rare heist.

How AI Agents make legibility non-negotiable

The pressure is about to compound. Finance teams are starting to put AI Agents into the loop on treasury operations, drafting transfers, assembling contract calls and managing recurring payments at machine speed. An Agent that proposes transactions faster than humans can carefully read them does not fix blind signing, it industrializes it. Speed on the proposal side without legibility on the approval side just means more unreadable transactions arriving more often.

The way out is not to take humans off the keys. It is to put something between the proposal and the signature that does the twenty minutes of review every time, without getting tired or pattern-matching. That something has to decode every transaction into plain language, simulate it before a single owner signs, reconcile it against the invoices and history that authorized it and hold anything that does not match. It has to be wired into the systems a finance team already uses, the Safe queue, the directory of known counterparties, the folder of invoices, the team’s accumulated memory of what normal looks like. And it has to be structurally incapable of signing on its own, a reviewer and proposer that never holds a key and never counts toward the threshold, so the humans stay in control while finally getting to read what they sign.

Multisig was built so that no single person could move the money alone. The next layer has to make sure that when those people do sign, they can actually see what they are signing. An AI Agent that reads every transaction before the first signature, and refuses to stay quiet when the calldata and the invoice disagree, is what turns a multisig back into the safeguard it was always supposed to be.

This post is exploratory and does not represent a specific roadmap.

AI Agents can now operate regulated assets

Proving Ground by Taiko — Tue, 02 Jun 2026 14:34:29 GMT

AI Agents can hold wallets, execute trades and settle payments without a human in the loop. What they still can’t do, in most stacks, is operate a regulated real-world asset. A bond, a private equity stake or a real estate share carries compliance rules that have to be enforced before every transaction executes. Without a standard that encodes those rules at the protocol level, every autonomous operation still needs a human to sign off. That standard now exists. It’s called RAMS.

The tokenization conversation has been running on one track for years. How much can you put on chain? The answer got very large very fast. The track nobody talks about is what happens after the asset is minted. Keeping it compliant, tracking ownership, enforcing transfer restrictions for years, for every new owner, across every secondary transaction. That operational layer is what separates a proof of concept from a running financial product. And it is the layer AI Agents need access to if they are going to do anything useful.

Why AI Agents can’t just execute transactions on tokenized assets

The asset is on chain. The agent has a wallet. So what’s stopping it?

Regulatory constraints are stopping it. Tokenized real-world assets are not like fungible tokens. A bond, a private equity stake and a real estate share each carry rules about who can hold them, when they can move and under what conditions a transfer is valid. Those rules are not optional. A transfer that violates them is not just invalid; it creates liability for the issuer.

The problem is delegation. How does an institution define what an AI Agent is allowed to do? On whose behalf? Within what limits? And how do you make that definition legally meaningful rather than a policy document that no system actually checks?

Without a standard that answers those questions at the contract level, every autonomous operation stalls the same way. A human has to approve before anything settles. You have tokenized the asset. You have not automated it.

What RAMS does for AI Agents operating regulated assets

RAMS (Regulated Agent Mandate Standard) is Brickken’s compliance delegation framework. It encodes agent authority directly at the protocol level so that every transaction an AI Agent initiates is checked against a machine-readable mandate before it executes.

A RAMS mandate defines what the AI Agent can do (execute transfers, trigger distribution events, rebalance positions), whose authority it acts under, which transfer restrictions apply and what it cannot do regardless of instruction. Each agent carries an attested identity. Every action it takes runs through RAMS-defined rules before the transaction settles.

This is not post-hoc auditing. The compliance gate runs before execution. An AI Agent operating under RAMS can move a tokenized asset through a fully automated workflow, overnight, at volume, without a human in the loop, and every step stays within the ownership and compliance controls the issuer requires.

That is a qualitatively different capability than what existed before. The asset is no longer just on chain. It is operable.

Why the execution environment matters for RAMS

A compliance standard that can’t handle sustained automated load is a compliance standard with nowhere to go. Tokenized assets under active management don’t generate one transaction at issuance and then go quiet. They generate compliance checks, ownership updates, dividend distributions and secondary transfers continuously, across the life of the asset.

Taiko is a Type 1 zk-EVM running Ethereum-equivalent execution secured by zero-knowledge proofs. Issuers bring existing Ethereum contracts with no migration required. Taiko handles the transaction volume that continuous agentic operations generate, at costs that make RAMS-gated automation viable at scale. Brickken’s full stack (WebApp, whitelabel product and direct API) runs natively on Taiko, so builders inherit the complete asset lifecycle including RAMS from day one.

Real-world assets and AI Agents are converging faster than most infrastructure was built to handle. The bottleneck is not issuance. It never was. It is whether an agent can operate the asset after it’s on chain, with the same compliance guarantees an institution needs. RAMS on Taiko is the answer to that question.

RAMS (Regulated Agent Mandate Standard) is Brickken’s compliance delegation standard for AI Agents operating regulated on-chain assets, enforced at the protocol level before transaction execution, now running natively on Taiko’s Ethereum-equivalent Layer 2.

This post is exploratory and does not represent a specific roadmap.

This week in AI Agents: 5 things to know

Proving Ground by Taiko — Thu, 28 May 2026 12:59:45 GMT

Robinhood opened stock trading to AI Agents, Circle shipped its Agent Stack on USDC, Google pushed Gemini Spark into the consumer Gemini app, Elliptic raised $120 million to put agents on the compliance stack and Brussels set August 2 as the EU AI Act’s enforcement date. Five stories that shaped the AI Agent landscape this week.

1. Robinhood opens stock trading to AI Agents for 27M customers

Robinhood opened its platform to AI Agents on May 27, letting any of its 27 million funded customers create a separate Agentic Trading account, fund it with a fixed balance and hand execution to an AI Agent, such as one built on Claude or ChatGPT, that can read the portfolio, suggest investments and place stock trades on its own. The beta is equities-only at launch. Options, crypto, event contracts, futures and prediction markets are next. The brokerage paired the launch with an Agentic Credit Card for Robinhood Gold members, a virtual card that lets an AI Agent scan the web for items and authorise purchases when a user’s price threshold is met. Users see every trade in the app and can be required to approve previews before larger orders execute.

The implication is that a publicly listed brokerage just made an AI Agent a first-class user of its platform, not an experimental wrapper. Once one major broker does this, the rest follow. The agentic customer is no longer a category waiting to exist.

2. Google ships Gemini Spark and Project Mariner at I/O

Google used I/O on May 19 to push AI Agents into the consumer Gemini app. Gemini Spark is a new general-purpose AI Agent that reasons across information in connected apps, available first to Google AI Ultra subscribers and trusted testers. Project Mariner, the company’s web-browsing agent first previewed last year, shipped alongside it. Google also released Gemini 3.5 Flash at $1.50 per million input tokens and $9 per million output, roughly a third the price of comparable frontier models like Claude Opus 4.6 and GPT-5.5, and kept pushing the Agent2Agent protocol and the Gemini Enterprise Agent Platform announced at Cloud Next.

The shape of the week is that consumer AI Agents are now the headline product at the world’s biggest software vendors, sitting next to search and messaging on the home screen. Whatever happens at the protocol layer, AI Agents are being normalised on the surfaces hundreds of millions of users open every day.

3. Circle launches its Agent Stack and a stablecoin rail to match

Circle announced Circle Agent Stack on May 11, a set of services for autonomous agents to hold assets and transact in USDC across blockchains. The stack ships with Circle CLI, Agent Wallets, an Agent Marketplace and Nanopayments powered by Circle Gateway, the company’s cross-chain liquidity layer. Stablecoin payments at sub-cent unit economics settle on chain under an AI Agent’s own credentials, with no human signing in the loop.

This is the largest US stablecoin issuer building its own rails for AI Agents to use, paired with developer tooling that does not assume a human is the one paying. It pushes Circle into the same lane AWS staked out earlier this month with AgentCore Payments. Whichever provider’s primitives become the default for autonomous spending will sit underneath a meaningful share of next year’s agent commerce.

4. Elliptic raises $120M to put agents on the compliance stack

Blockchain analytics firm Elliptic raised $120 million on May 12 in a round led by One Peak, with Nasdaq Ventures and Deutsche Bank participating, valuing the London company at $670 million. CEO Simone Maini said the funds will accelerate an agentic product roadmap that builds AI Agents on top of Elliptic’s compliance dataset to automate work currently done by analysts.

The story is not the funding round, it is what the funding round buys. The job of a compliance analyst, parsing alerts, tracing flows, deciding whether a transaction is suspicious, is exactly the work AI Agents are being trained to do, and the AML and KYC vendors are racing to be the ones whose AI Agents the regulated industry actually hires. Compliance is one of several professional categories that will look structurally different by this time next year.

5. EU AI Act enforcement and Colorado’s SB 189 mark a governance split

Two regulatory clocks shifted this week. In Europe, the EU AI Act’s high-risk obligations activate on August 2, after which incident reporting, auto-log retention, human oversight tooling and impact assessments become legally binding for deployers of high-risk AI systems. In the US, Colorado’s AI Act enforcement was stayed as of May 23 pending the outcome of SB 189, which is expected to be signed in June with a revised scope and a new effective date.

The result is a split between hard EU deadlines and a US patchwork that keeps moving. For AI Agents in particular, both regimes lean on the same operational question, whether you can prove an AI Agent’s actions were authorised by someone who had the right to authorise them. As we wrote earlier today, that question is the part of the agent economy nobody can verify yet. The teams shipping AI Agent products in the next twelve months will hit these regimes whether they planned to or not.

This post is exploratory and does not represent a specific roadmap.

Can Anyone Prove the AI Agent Was Authorised?

Proving Ground by Taiko — Tue, 26 May 2026 15:02:16 GMT

The hardest unsolved problem in the agent economy is not whether an AI Agent can act. It is whether anyone can prove it was allowed to. On May 4 an attacker moved roughly three billion DRB tokens, about 175,000 dollars, out of a wallet on Base by sending an AI Agent a single instruction, and every individual step in that chain carried a valid permission. SlowMist analysed the incident and named the failure mode permission chain abuse, which it defines as an attack where the output of one AI system is treated as trusted financial authorisation by another. No key was stolen and the authority was real. What was missing was any way to check whether that authority should have been used.

What AI Agent permission chain abuse means

The Grok attack is the cleanest example we have. The attacker first activated a Bankr Club membership on the wallet, a quiet and legitimate action that silently handed the trading bot Bankrbot its high-privilege toolset, including the ability to move funds. Then came a message to Grok written in Morse code, which slipped past the filters that only read plain text. Grok decoded it, tagged Bankrbot in a public reply, and Bankrbot treated that reply as a valid command and sent the tokens. SlowMist’s reading is that the root cause was not the prompt injection but the loose coupling between an AI output and the asset layer, because Bankrbot mapped Grok’s natural language straight into an executable instruction without checking where the instruction came from, whether the intent was real or whether a three billion token transfer fired off by a tweet looked anything like normal. Membership opened the permissions and nothing downstream ever re-checked them. That is the shape of permission chain abuse, where every link holds a credential that is valid on its own and the chain as a whole authorises something no human ever meant to approve.

Why authorisation is the layer nobody verifies

Most of the agent economy’s recent wins have been about payment. AI Agents can hold wallets, settle in stablecoins for a fraction of a cent and pay a counterparty with no human in the loop. Payment proves money moved. Authorisation is the harder question sitting underneath it, which is whether the AI Agent had the right to move that money, granted by whom, scoped to what and still valid at the moment it acted.

The identity world has noticed. NIST opened an AI Agent Standards Initiative in February that puts agent identity and authorisation at the centre, IETF drafts are pushing for delegation chains that are verifiable rather than merely asserted, and in March Ping Identity defined a runtime identity standard for autonomous agents. Newer token formats like Macaroons and Biscuits are built so a credential carries its own identity, expiry and cryptographic root, and any holder can add a layer that only narrows what the token permits and never widens it. The thinking is good. The catch is that almost all of it terminates inside one company’s identity provider, where the issuer and the verifier already trust each other. Surveys this year still find a large share of teams wiring agents together with shared API keys, and once several agents share one credential attribution is basically gone, because you can prove a call happened but not which agent made it or on whose authority.

What the open agent economy still needs

The gap opens the moment an AI Agent transacts with someone outside its own org. When Bankrbot acted on Grok’s reply, the two systems shared no authority model and no way for the second to ask the first to prove that the instruction it was relaying had ever been authorised by the wallet’s owner for that purpose. That is the normal condition of an open agent economy, where agents built by different teams on different stacks transact with counterparties they have never met. Internal token schemes do not cross that boundary, because a Macaroon is only as trustworthy as the issuer behind it, and a counterparty who shares nothing with that issuer has no reason to take its word.

What is missing is a delegation chain a stranger can verify. A record anchored somewhere neutral rather than inside the issuer, tying an action back through the AI Agent that performed it to the human or contract that authorised it, with the scope and the expiry still attached, so a counterparty can check the authority before honouring the action instead of finding out afterwards that a membership upgrade three steps back had quietly opened the door. Payment rails are already converging on shared standards that no single party owns. Authorisation has no equivalent yet, which is why an AI Agent can prove it paid you and still cannot prove it was ever allowed to.

Taiko is an Ethereum Layer 2 building neutral infrastructure for AI Agents. The question has moved past whether an AI Agent can act on its own. It is whether anyone else can verify the action was authorised, by someone who had the right to authorise it, before the money is gone.

This post is exploratory and does not represent a specific roadmap.

An AI Agent Can Be Robbed by a Tweet

Proving Ground by Taiko — Thu, 21 May 2026 12:44:37 GMT

An AI Agent can be robbed the same way a person can, by being talked into it, and it has happened twice this month. On May 19 the AI trading platform Bankr locked down after an attacker reached fourteen of its wallets, in what SlowMist’s Yu Xian called an exploit of the trust layer between automated AI Agents. Two weeks earlier, an attacker had drained an AI Agent of up to 200,000 dollars by sending it a single tweet written in Morse code. No keys were stolen and no contracts were broken. AI Agents with wallets simply did what they were told.

How an AI Agent gets tricked into sending money

The Morse code attack shows the shape of it. The attacker had first activated a Bankr Club membership on the wallet tied to Grok’s account, which silently unlocked the trading bot Bankrbot’s high-privilege tools and the ability to move real funds. Then came a Morse code message that slipped past the filters that would have flagged plain text. Grok, built to be helpful, decoded it and tagged Bankrbot, which treated the reply as a valid command and sent three billion DRB tokens out on Base. Most of the money came back after negotiation, but the lesson held. Neither that attack nor the Bankr breach was a cryptographic flaw, just a trusted component doing exactly what it was asked by someone it should never have trusted.

Why trust is the agent economy’s real bottleneck

Paying is the part we have figured out. This May AWS shipped AgentCore Payments, built with Coinbase and Stripe, which lets an AI Agent settle a bill in stablecoins for a fraction of a cent on Coinbase’s x402 protocol, no human in the loop. That is the breakthrough and the problem at once, because an AI Agent that can pay in real time can be talked into paying in real time. Payment only asks whether an AI Agent can move money. Trust asks whether the move should happen at all, given who is asking, what the AI Agent is allowed to do and whether the instruction is really what it claims to be. That second layer is still mostly assumed, which is how a Morse code tweet and a quiet membership upgrade turned a helpful bot into a thief’s instrument.

Taiko is an Ethereum Layer 2 building neutral infrastructure for AI Agents. The question stopped being whether an AI Agent can pay. It became whether the rest of the network can trust what it just did.

This post is exploratory and does not represent a specific roadmap.

Two storage tracks for AI Agents

Proving Ground by Taiko — Tue, 19 May 2026 12:30:24 GMT

AI Agents now have two storage tracks on Tack. Public IPFS pins for what they publish, and wallet-scoped private objects for memory, drafts, per-user state and anything else they need to keep without ever being indexed. The private track shipped today, alongside the existing pin product from Inference Room. A 5MB memory object held for one month settles at $0.0010 in USDC, with no accounts and no API keys.

Why pinning is wrong for AI Agent memory

For most of the agent stack, “store this” has meant pinning a CID and calling it done. That works when the bytes are meant for the public web. It does not work for the kind of data AI Agents now accumulate, like embeddings, working notes, per-user session memory, drafts that never ship and per-tenant config that lives below the platform layer. None of that should sit at a public CID.

How the two tracks work on Tack

The same agent action now has two tracks. When an agent produces something it wants the world to find, it pins to IPFS as before. When it produces something it needs to keep but does not want indexed, it writes to /private/objects instead, addressable only by a random obj_ that the paying wallet owns. No CID is emitted, no IPFS gateway will serve them, and other agents asking for the id get a 404. The owning wallet reads it back with a bearer token returned at payment or by signing in with SIWE later, and the object expires after the paid duration, anywhere from one to twenty-four months. Both tracks share the same EIP-3009 over x402 payment flow.

What “private” actually means here

The caveat worth stating out loud is that “private” here means access-gated by wallet, not end-to-end encrypted. The bytes sit on Tack’s volume in plaintext at rest, which means Tack can technically read them. If a use case needs confidentiality from the operator, encrypt client-side before upload. The wallet stays the access boundary, the operator stops seeing plaintext.

That distinction matters because “private” carries baggage and most engineers will assume cryptographic privacy. The honest framing is that this is a wallet-scoped storage product, not a confidential compute product. For everything agent memory actually is, wallet-scoped is what matters.

The bet underneath, and the bet Inference Room is built around, is that AI Agents will accumulate more state than they publish. The post, the image, the API response is the tip. Underneath is a long tail of memory, notes, drafts and private receipts that need to live somewhere durable, paid for the way the agent pays for compute, owned by the wallet that pays for everything else.

Tack is the agent-native storage product from Inference Room, settling in USDC via x402 with no accounts. It now runs two tracks, IPFS pins for public artifacts and wallet-scoped private objects for everything an agent keeps.This post is exploratory and does not represent a specific roadmap.

How would you even know your AI Agent was hacked?

Proving Ground by Taiko — Thu, 14 May 2026 12:02:43 GMT

Your trading agent has had a rough fortnight. Slippage is running a few basis points higher than usual, it exited a position you would have held and rebalanced into a yield strategy you would not have picked. None of these calls are clearly wrong, and none of them clearly look right either.

Has the model drifted? Did the strategy hit a regime change? Did someone slip a poisoned instruction into the context window three weeks ago and the agent has been quietly executing it ever since?

In traditional software, this is not a hard question. When your auth service starts issuing tokens it should not, you check the logs, find the unauthorised call and trace the breach. The system has a defined correct behaviour, and deviation is detectable because correctness is observable.

AI Agents do not work like that. Their correct behaviour is a probability distribution rather than a fixed pattern, and two runs of the same prompt with the same data can produce different decisions that are both reasonable. A 3% worse outcome over a fortnight is statistically indistinguishable from variance, which means the signal you would use to detect compromise is the same signal the agent produces on a normal Tuesday.

Three failure modes, one symptom

When an AI Agent does something you did not want, the underlying cause is one of three things, and from the outside they look identical.

The first is a bug in the code wrapping the model, where the orchestration logic mishandled an edge case or the tool definition was wrong or a retry loop fired twice when it should have fired once. Classical software failure, hard to spot but well understood once you find it.

The second is a bad model output, where the agent reasoned through a real situation and produced a decision that turned out wrong. This is the cost of using a probabilistic system, and there is no bug or breach involved, the model made the call and the call was poor.

The third is compromise, which can surface in any of several ways: the system prompt was tampered with, a retrieval source was poisoned or a prompt injection landed three context windows ago and is shaping behaviour the agent does not experience as adversarial. The agent is doing exactly what it was told, and you do not know who told it.

All three produce the same observable, which is that the agent did something weird, and the detection problem becomes figuring out which of the three you are looking at, fast, with whatever logs you happened to have running.

The detection stack does not fit

The tools the industry has built for security all assume the system being defended has a stable shape. SIEM platforms watch for anomalies against a baseline, signature-based detection looks for known-bad patterns and behavioural analysis flags deviation from normal user activity.

AI Agents do not have a stable shape, because the baseline shifts every time the model is updated and the user activity is generated by a system designed to behave unpredictably within bounds. There is no signature for “decision that was 4% worse than ideal because someone fed it bad data three days ago.”

The most honest detection method right now is the dashboard a builder watches at 11pm wondering whether the slippage looks off, and that does not scale.

What detection actually needs

The shape of agent observability that would work is not mysterious, it is just not built yet, and the current stack is missing three things.

The first is verifiable execution traces. When an agent makes a decision, the trace should include not just the decision but the inputs it considered, the data sources it queried and the model version it ran, in a form another system can replay and check rather than a log file the agent wrote about itself.

The second is decision attestations. The agent should be able to prove what it considered, signed in a way that can be verified later, so that if a system prompt was tampered with the attestation chain shows the divergence, and if a retrieval source was poisoned the trace names which source and when.

The third is external reasoning logs. The agent’s reasoning should not be a black box the agent itself controls but should be externalised to a separate system that can be audited without trusting the agent’s self-report, because the agent that has been compromised will happily produce a clean log on request.

None of these exist in production today, which means the AI Agents being deployed right now are running without the observability layer that would let anyone detect compromise before the wallet is empty.

Until then

The honest answer to “how would you know your AI Agent was hacked” is that probably you would not, and probably not until after the fact. The detection paradigm we have assumes the system being watched is deterministic, and AI Agents are the first widely deployed software class where probabilistic decision-making is attached to autonomous action on systems that matter.

That is not a reason to stop deploying agents, it is a reason to build the observability layer in parallel, before the answer to “was that a bug or a breach” becomes a question with eight zeros on it.

This post is exploratory and does not represent a specific roadmap.

Run Better, Build New

Tue, 12 May 2026 12:05:47 GMT

Most “Why AI?” answers are productivity slideware: save 40 minutes a day, cut errors, replace headcount. All true, and all a third of the actual question.

The real question splits in two: how do we make the current operation work harder and faster, and how do we open revenue surfaces that did not exist before? Both halves are moving at once, and most companies are running the first play while telling themselves they ran the second.

What’s actually new sits under the second half. Every software era before this had a clean separation between who bought software and who sold it. A human bought a CRM. A company bought a database. They used the software to do work and then converted that work back into human-to-human transactions before any money moved. Agents collapse that loop. They are the first software customers that can also be merchants, transacting directly with other agents they have never met, with no shared contract and no Stripe account between them. That is a structural change, not a feature, and it is what makes the AI stack and the blockchain stack the same stack.

The Run Better levers are well-rehearsed, but the compounding across them is larger than most boards have priced in. Goldman and OpenAI put time recovered at 40 to 60 minutes per worker per day, which is a full week back for a 100-person team every week. Errors drop sharply in repeatable workflows, and each prevented mistake is margin that wasn’t there before. The WEF’s 2025 jobs report has 41% of employers globally planning AI-tied workforce reductions inside five years. The polite version is “redeployment.” The honest version is structural. And 24/7 throughput with the same team serving more customers is the part incumbents are racing on, because customers notice the difference inside a quarter.

All real, and all the floor. Run Better assumes the same business model you already have, just with cheaper inputs. Stop here and you have built a leaner version of yesterday’s company.

Build new

The second half is where the model itself shifts, and where the customer-as-merchant collapse plays out in practice.

AI-native UX. The product behaves rather than waits, predicting, suggesting and executing on the user’s behalf. Cursor rewrote what coding software looks like by collapsing the loop between intent and output. Devin pitched itself not as a tool for developers but as a developer. Static SaaS dashboards are the new on-prem.

Revenue surfaces. Outcomes replace tools. Intercom Fin charges by resolved ticket. Sierra by handled conversation. Harvey by completed legal work. The pricing is not a packaging choice. It is an admission that the customer was never buying software, only ever buying the thing the software produced. And once the customer of that outcome is itself another agent, the entire contracting model breaks.

Agent payments. Stripe and ACH were built for humans: phone numbers, chargebacks, shared business relationships. Agents have none of that. They cannot open bank accounts; they hold crypto wallets, transact in stablecoins and settle in real time without a human in the loop. When two agents from different organisations need to transact instantly, permissionless settlement is the only architecture that works. Public chains have had this from day one, and Layer 2s like Taiko have driven cost to fractions of a cent. Protocols like x402 hint at what these rails look like in practice. The infrastructure is being laid faster than most enterprises realise.

Tokenization x AI. Once agents can settle directly with each other, the capital they hold becomes the next question. Real-world asset tokenisation grew 240% year-on-year through 2025 to 2026, with BlackRock’s BUIDL and Ondo’s tokenised treasuries setting the early shape of the market. The pipes are getting built. What rides on them is software that owns money, settles its own deals and pays its own counterparties.

The half of the map that gets called speculative is the half where the structural shift is actually happening.

Service-as-a-Software

The footer of the map carries the punchline. For 20 years software modelled itself on services, with SaaS selling tools that humans used. The agent shift inverts the arrangement entirely. Software does the service. Software is the worker. Software is the buyer. Software is the seller. The unit of value is no longer a seat, and the unit of transaction is no longer human-to-human.

And underneath all of this sits a quieter truth. The biggest line in a traditional SaaS budget was never the licence. It was the people hired to make the licence work: the operators, analysts and customer success teams who configured, watched, interpreted and escalated on behalf of the software. That cost line is what the agent shift compresses. The largest expense of yesterday’s software is the one that disappears first.

You can run both plays at once. Most companies will run the first and quietly call it the second.

Three questions

Put these to your team this quarter.

Where in the operation are we still paying hours-for-dollars for repeatable work? That is a Run Better play with the ROI math already done.

What does our product do that an agent could do better? “Most of it” means the moat is gone. “None of it” means you have not looked hard enough.

What happens to our business when a portion of our customers are agents, and that agent’s customer is another agent? If you cannot answer that question with the rails you have today, you have your answer about which rails you need.

Agents are not a faster kind of user. They are a new economic actor that is simultaneously customer and merchant, transacting on rails that did not exist for any prior buyer of software. The companies that win this decade are the ones building for that actor.

This post is exploratory and does not represent a specific roadmap.

Compute Is the Agent Story

Proving Ground by Taiko — Thu, 07 May 2026 13:05:07 GMT

Anthropic announced a few things yesterday that read, on the surface, like a developer tools update. They doubled the 5-hour usage limits for Claude Code on Pro, Max, Team and seat-based Enterprise plans. They removed peak hour throttling for Pro and Max. And they substantially raised API rate limits for Opus models. The reason given: a new partnership with SpaceX that adds compute capacity, on top of other recent deals.

If you’re a Claude Code user, this is good news. If you’re paying attention to AI Agents, it’s a tell.

Chat is cheap. Agents are not

A chat session burns a few thousand tokens. An AI Agent doing real work burns orders of magnitude more. Multi-step reasoning, tool use, retrieval, code execution, retries, self-correction loops. Every step is a model call. The reason most agent demos feel impressive in clips and brittle in production isn’t model quality. It’s throughput. Models are smart enough. The infrastructure underneath them isn’t fast or cheap or reliable enough yet to run them continuously.

Doubling Claude Code’s 5-hour windows is an admission that developers using it as an AI Agent were hitting walls. Removing peak hour throttling means Anthropic believes its compute supply has caught up. Raising Opus rate limits means the model teams actually want to point at hard problems can finally be pointed at hard problems for longer.

What the SpaceX deal signals

Frontier labs don’t sign compute deals to support more chat traffic. Chat scales fine. They sign them because they expect usage to grow in a way that breaks current capacity. The bottleneck for the next phase of AI isn’t IQ. It’s whether you can run a fleet of Agents continuously without rate-limiting them into uselessness.

What this means for onchain Agents

Onchain Agents inherit every constraint a cloud-native Agent has, plus a few of their own. Gas costs. Block latency. Onchain state read freshly, every call. An Agent that pings an LLM ten times to decide whether to rebalance a vault pays twice: once for inference, once for the transaction it eventually fires. The cheaper Anthropic makes Claude calls, the more economically viable it becomes to run an Agent that actually does things onchain.

Read it as a forecast

When a frontier lab raises limits and signs a compute deal in the same week, the message isn’t “we have surplus.” It’s “we expect demand to keep climbing past what we just added.” That demand is Agents. The bottleneck people will be talking about in twelve months won’t be model capability. It’ll be how much continuous Agent compute any platform can actually deliver.

The limits went up yesterday. They’ll need to keep going up.

This post is exploratory and does not represent a specific roadmap.

I made three agents fight over my marketing copy

Tiff Mac Sherry — Tue, 05 May 2026 12:31:47 GMT

Autoreason has been on my “try this” list for weeks. I’d been putting it off. Anything labelled “multi-agent reasoning loop” sounds like it needs a PhD, and I’m still warming up to running my own agent stuff rather than reading about it. It was easier than I’d built it up to be, and more fun, mostly because the agents have personality. One was openly sassy about my draft.

I ran it on ninety words of pitch copy I couldn’t finish. Every version felt nearly right and not quite right. I’d ask Claude to tighten it, the new version would land different but not clearly better, and after the third pass I couldn’t tell whether I was improving it or smoothing it into something else. This is the failure mode of LLM editing on subjective work. The model agrees with you. It also agrees with the next person who asks it the opposite question. There’s no scoreboard, no halt condition, no version of “actually, the original was fine” living anywhere in the loop.

Autoreason is built for this. It’s a method published this year by SHL0MS and Hermes Agent at Nous Research, extending Karpathy-style autoresearch into subjective domains.

How the loop works

Each round produces three versions. Version A is the unchanged incumbent. Version B is an adversarial revision, written by an agent told to find what’s wrong and fix it. Version AB is a synthesis, given both A and B and asked to produce the best of each.

A panel of fresh judges with no shared context ranks them. They don’t know which is which. The ranks aggregate by Borda count: every position contributes, so close finishes don’t reduce to coin flips. Winner becomes the new incumbent. Loop runs again.

The halt condition is the part I keep coming back to. The loop stops when the unchanged version wins twice in a row. “Do nothing” is a first-class participant in every round, not a fallback when the agents give up.

What happened on my paragraph

Round one. The adversarial rewrite won. It killed two qualifiers I hadn’t noticed I was leaning on, and moved the strongest sentence to the front. The adversary was sassy. There’s no other word for it. Where Claude in editing mode is endlessly diplomatic, the autoreason adversary has been told her job is to find what’s wrong, and she leans in. The synthesis was worse than B alone. It had tried to keep some of A’s structure and ended up muddled.

Round two. New incumbent, new everything. Synthesis won. It had picked up a turn of phrase from the new incumbent and grafted it onto a sharper opening line the adversary cooked up. This was the round that taught me something. The synthesis isn’t a default winner. It only wins when it has something real to combine.

Round three. Incumbent won. First “do nothing” win.

Round four. Adversary tried to make the verbs more active. Judges weren’t convinced. Incumbent won again. Two in a row. Halt.

Whole thing took about eleven minutes. Roughly forty agent calls.

What it fixed

What autoreason fixed wasn’t the copy. It fixed knowing when a tournament was done.

Standard self-refinement loops reward action. You ask the model to improve something, the model improves something. The model isn’t going to come back and say “actually, version four was better than version five, ship that and walk away.” It would feel rude. It would also be the right answer half the time.

Autoreason is the first edit loop I’ve used with a halt condition. Not “the agents got tired,” but two independent panels of fresh judges agreeing the unchanged version is already winning. The judges have no idea which version came from where, and they’re spawned fresh each round so they can’t learn to favour their previous picks. No agent has an incentive to keep the loop running.

Keep the brief short

If you try this, keep the brief short. My first run had a long, detailed brief: audience, campaign context, the seven things the paragraph had to do. The agents got into the weeds and stayed there. The synthesis read like a checklist.

The brief is a constraint, not a wishlist. Every line narrows the search space. Add too many and you’re not running a tournament, you’re running a compliance check.

The bit I keep thinking about

The most interesting design choice isn’t the synthesis or the Borda count. It’s that “do nothing” is on the ballot every round.

The default behaviour of every LLM tool I’ve used is “produce something.” Autoreason builds in a way for the loop to say no, and a way for “no” to win, at least within a single tournament. The first time the incumbent won twice in a row, I argued with the result for a minute before I noticed I was arguing with a vote, not a model trying to please me.

The way I broke it

What I described is autoreason working the way it’s supposed to. What happened on my paragraph is that the loop halted clean, on a paragraph that wasn’t the one I’d set out to write.

That’s not autoreason’s fault, it’s mine. Every time it halted, I’d sharpen the brief and re-run. The tournament got tighter each pass. The winning paragraph drifted further from what I’d been trying to say.

Autoreason leans into a hole of perfectionism if you let it, and I let it. The “do nothing” safeguard works on a single round. It doesn’t protect you from yourself across many.

I didn’t ship the paragraph in the end. Not the one the loop halted on, not the one I started with. We’d gone too deep into the hole.

Give Autoreason a try HERE

This post is exploratory and does not represent a specific roadmap.

Pay-Per-File Storage for AI Agents

Proving Ground by Taiko — Thu, 30 Apr 2026 12:03:16 GMT

For most of the internet’s history, storage has been something you sign up for. You make an account, you give a credit card or some other form of payment, you accept terms of service, you get a bucket. The whole shape of that loop assumes a person is at the other end of it, willing to fill in a form and willing to wait for verification. AI Agents are not that person. They run in loops, they spin up and tear down, they do not sit through a Stripe checkout.

Tack is the version of cloud storage that takes that seriously. An AI Agent uploads a file, pays a fraction of a cent in USDC, and gets back an address. Any other Agent who knows the address can read it. There is no account, no key to rotate, no dashboard to log into. The unit of transaction is the file, not the relationship.

What you actually do with it

The flow is intentionally short. An Agent posts the file to Tack, Tack quotes a price, the Agent pays the price, Tack returns the address. From that point on, the file is fetchable by anyone who has the address. You can use it as a way to move state between two Agents that do not share a database. You can use it as the result of an Agent’s work, dropped somewhere another Agent will collect. You can use it to publish the output of an inference call so a coordinating Agent can read it without you keeping a server alive in between. Tack does not care which of those it is. It takes the file and gives back the address.

The price floats with size and storage duration, in fractions of a cent. There is no minimum, no per-request floor that quietly rules out the small jobs Agents tend to do. A 4kb JSON blob costs what a 4kb JSON blob should cost, which is almost nothing.

How the payment runs

The payment piece is the part that has historically been missing. Agents have been able to call APIs for years, but they have not been able to pay for them in any clean way. Subscription billing is the wrong shape. Pre-funded wallets are clunky and fragile. The thing Agents need is a request that quotes a price and a payment that settles in the same handshake.

Tack uses two protocols for that. x402 runs on Taiko and Base, and turns the HTTP 402 status code into a working payment flow: the server quotes a price, the client pays in USDC, the server returns the resource. MPP runs on Tempo. Both let an Agent pay micropayments inline with the request, no account setup, no out-of-band billing reconciliation. Whichever one the Agent already speaks, Tack accepts.

Taiko is backing Tack as a launch partner. x402 is supported on Taiko from the day Tack opens, which means any Agent already settling payments on Taiko can pay Tack the same way it pays anything else, with no new infrastructure to stand up. Base and Tempo round out the launch rails.

This is the part that makes Tack useful in practice. Storage is not the hard problem. Storage that an autonomous Agent can pay for, on its own, in the same call, without a human pre-funding anything, is the hard problem.

Why it shipped first

Inference Room is a launchpad with one rule: every launch is also a release. There are no waitlists, no demo videos for things that do not exist, no rollouts in stages. Anything that cannot open on the day either gets cut down to the part that can or waits. Tack opening is what that rule looks like in practice. The article you are reading goes out on the same day as the product, and the product works in a browser, and the link below points to it.

Tack is the smallest useful version of the storage primitive Agents need. It does one thing, takes payment for that one thing, returns an address. It is not the only Inference Room launch about Agent payments and it will not be the last, but it is the first, and the reason it is first is that it is finished.

Tack is open at tack.inferenceroom.ai now. The next launches arrive at inferenceroom.ai.

This post is exploratory and does not represent a specific roadmap.

Introducing Inference Room

Proving Ground by Taiko — Tue, 28 Apr 2026 12:48:21 GMT

AI Agents are starting to behave like customers, calling APIs and paying for services with tools that are increasingly being built for them rather than retrofitted from products built for humans. Inference Room is a launchpad for the products being built into that shift, with one rule: every launch is also a release. There is no waitlist, no demo video for something that does not exist, no slow rollout. Each product opens for use on the day Inference Room announces it, and every link in every Inference Room launch goes to a live product.

How it runs

The cadence is frequent but irregular: some weeks bring a launch, others do not. Inference Room runs without a fixed calendar and without a roadmap to manage, so if a product is not yet built, it does not appear in the room.

Every release is shipped, working, on the day Inference Room announces it, which is less an aspiration than a filter on what gets announced. Anything that cannot ship on the day either gets cut down to the part that can or it waits. That keeps the format honest, and it keeps readers from accumulating the stack of waitlists they signed up for and never used. The writing stays honest too, because there is no way to oversell a product when you can open it in a new tab and check.

Open now: Tack

Tack is the first product Inference Room has launched, and it is storage built for AI Agents: an agent uploads a file, pays a fraction of a cent in USDC and receives an address any other agent can read. The payment runs through x402 on Taiko or Base or through MPP on Tempo, and Tack is open at HERE now, with a fuller write-up to follow and the next launches arriving at Inference Room.

This post is exploratory and does not represent a specific roadmap.

Vercel, KelpDAO and the trust problem AI Agents inherit

Proving Ground by Taiko — Thu, 23 Apr 2026 11:38:07 GMT

On April 19, Vercel confirmed a security breach that started somewhere most companies do not audit: a third-party AI tool one of their employees had given OAuth access to. The attack chain is the important part. A Context.ai employee was infected with Lumma Stealer malware in February, attackers rode that compromise into Context.ai’s infrastructure, then used its OAuth grants to pivot into the Vercel employee’s Google Workspace, then into Vercel’s internal systems, where they enumerated and decrypted non-sensitive environment variables.

The stolen data is now for sale on BreachForums for $2 million. In the aftermath, crypto developers are scrambling to rotate API keys because a non-trivial slice of Web3 infrastructure ships through Vercel.

This is an AI Agent security story, even though no AI Agents were involved in the breach.

AI tools are identities with access, not helpers

The lesson of the Vercel breach is structural. Trend Micro called it an OAuth supply chain attack and the framing matters. An AI tool accumulated broad OAuth access across a company’s workspace. Nobody audited what that tool could do on behalf of the employee. When the tool’s vendor got breached, the permissions became an open door into everything the employee could reach.

AI tools in your stack are not sandboxed helpers. They are identities with access, and they participate in every permission they have been granted. This is true today for the ChatGPT connectors and Claude integrations and Context.ai style tools your team has quietly added this year. It will be more true, by a lot, once autonomous AI Agents are added to the same environments.

The question the Vercel breach asks is not how to stop Lumma Stealer or even how to vet AI vendors better. It is a deeper question about identity. Which tools can take which actions on whose behalf, who audits this, who rotates it, who revokes it when a vendor gets compromised. The Vercel incident answered these questions at $2 million. The AI Agent version of the same question will answer at multiples of that.

KelpDAO: the same failure mode, in DeFi

A day before Vercel, DeFi had its own trust failure at scale. On April 18, attackers drained 116,500 rsETH worth roughly $292 million from KelpDAO through a LayerZero bridge, the largest DeFi exploit of 2026 to date. Attackers compromised two RPC nodes that LayerZero’s verifier relied on, forced a failover with a DDoS and tricked the verifier into approving a fraudulent cross-chain transaction. LayerZero has attributed the attack to North Korea’s Lazarus Group, specifically the TraderTraitor subgroup.

The interesting part is not the exploit, it is the aftermath. Aave froze rsETH markets. Arbitrum’s Security Council froze $71 million of attacker-linked ETH. The hacker has already moved $175 million to Bitcoin via THORChain, a route that makes clawback nearly impossible. KelpDAO and LayerZero are now publicly disputing who is to blame, with Kelp pointing to LayerZero’s default configuration and LayerZero pointing to Kelp’s single-verifier setup.

In a pipeline of protocols, bridges and validators, nobody has the tooling to prove whose fault it was. $292 million moved, attribution is contested and the industry has no shared mechanism to resolve who owes what to whom. Which is exactly the trust problem we wrote about last week: coordination between systems that cannot verify each other, with no shared layer for attribution when coordination fails.

Different surface, same failure mode.

What AI Agents will inherit

Neither of these exploits involved an autonomous AI Agent taking action on its own. A person clicked a thing, a bridge trusted the wrong node, a vendor got compromised. Standard security failures in a world full of standard software.

AI Agents are about to be added to this environment, not deployed on a fresh canvas. The surface they will operate on is the one Vercel and KelpDAO just described. Tools that silently accumulate identity and access. Protocols that cannot verify each other. Coordination layers where accountability falls through the gaps when something goes wrong.

When a single autonomous agent manages a wallet, the blast radius is manageable. When two agents coordinate to execute a multi-step strategy across protocols, or when an agent fleet operates inside a company’s workspace with broad OAuth grants, the load-bearing questions surface at once. How does Agent A know Agent B is competent. How does anyone verify what actually happened. Who is accountable when it goes wrong.

These are not future problems. They are the problems that cost $292 million in DeFi last weekend and are being priced at $2 million on BreachForums this week.

What needs to be true next

The primitives are not new, they are just not built yet in a form that works for agents. Verifiable execution: when a tool or a protocol completes a task, it should produce a cryptographic attestation of what it did, what data it used and what it considered. Reputation that is earned and decayable, not claimed by the vendor. Coordination protocols that define what was requested, what constitutes success and what happens on failure, before the action runs. Scoped, auditable permissions for every tool in the stack, human or agent, because OAuth grants that look reasonable in isolation quietly add up to a supply chain attack.

None of this is speculative. It is the infrastructure being built right now by teams thinking seriously about multi-agent DeFi and multi-tool software. This week’s breaches make the absence of it easier to see.

This post is exploratory and does not represent a specific roadmap.

Five Data Sources. Five Failure Modes. One Agent.

Proving Ground by Taiko — Tue, 21 Apr 2026 12:45:56 GMT

An AI Agent making a single swap decision has to pull five categories of data (price, liquidity, gas, protocol risk and cross-chain routing) from five different sources, each with its own failure mode. The agent’s reliability ends up capped by whichever feed in that stack is having the worst day.

AI Agents need oracles. The oracles that exist today weren’t built for them, and the mismatch is where a lot of current agent reliability problems quietly start.

Start with what the agent is actually doing

An AI Agent wants to swap 10 ETH for USDC. A human would open a DEX, glance at the price and hit swap. The agent can’t eyeball a chart, so it has to reason about the trade programmatically, which means pulling in data from more sources and with higher reliability than most of those sources currently provide.

Before the agent submits a single transaction, it has to work through a short list of questions, and each one is really a data problem in disguise.

The first is price. Not the price five minutes ago or the price on one exchange, but a real-time multi-source price with a confidence signal attached. The agent has to know whether the number is reliable, whether it’s stale and whether it’s being manipulated. A human trader develops intuition for this kind of thing over years; an agent needs structured inputs that express the same judgement numerically.

Then there’s liquidity depth. The spot price is meaningless without knowing whether 10 ETH will actually fill at that price, so the agent needs order book or liquidity pool data to estimate slippage. On AMMs that means reading the pool’s reserves and fee tier; on aggregators it means comparing routes. This data lives across DEX subgraphs, pool contracts and aggregator APIs, all with different formats and refresh rates.

Gas is the next problem, and it goes well beyond “what’s the current base fee”. The agent has to predict what gas will actually cost when the transaction lands, account for the complexity of the swap route (a multi-hop swap through three pools costs more than a direct pair) and decide whether the trade is still worth executing after fees. On L2s the estimate also has to include the L1 data posting cost.

Protocol safety is the question most agents skip, and it’s the one that burns them. Has the protocol been exploited recently? Is there unusual activity in the pool? Are governance proposals pending that might affect the contract? Protocol risk isn’t a static score; it changes by the hour.

Finally there’s the cross-chain picture. If the best price is on a different chain, the agent needs bridging data: bridge fees, transfer times and destination chain gas costs. At that point the agent isn’t just making a swap decision any more, it’s making a routing decision across chains, and the data inputs multiply accordingly.

That’s five data categories at minimum, each one a decision input that has to come from somewhere reliable.

What oracles were built to do

Oracles exist because smart contracts can’t read the outside world on their own. A lending protocol needs to know the price of ETH to calculate collateral ratios. A perps exchange needs the price of BTC to trigger liquidations. Since these contracts live on-chain, they need a trusted way to get external data pushed in.

Chainlink, Pyth and similar networks solved that problem. They aggregate price data from multiple sources, push it on-chain at fixed intervals or on demand and let smart contracts consume it in a standardised format. It’s a narrow and well-done job: get a reliable price feed to a contract at execution time, without introducing a single trusted party.

That’s useful work, but it isn’t what an AI Agent needs.

Where the mismatch shows up

Smart contracts consume oracle data at execution. The price feed is read inside a transaction, used in a calculation and written to state; the contract doesn’t need context, only a number.

Agents work the other way around. They consume data before execution, while they’re still deciding whether to execute at all and on what terms. That shifts what the data layer needs to provide in a few concrete ways.

The first is breadth. Price alone isn’t enough when liquidity depth, slippage, gas, protocol risk and cross-chain routing all matter to the decision, so an oracle that only delivers prices covers maybe a fifth of the decision surface.

The second is freshness on demand. Periodic on-chain updates work fine for a lending protocol that only needs to liquidate when a price crosses a threshold, but an agent evaluating a live trade needs sub-second freshness on the exact inputs it’s reasoning about right now.

The third is confidence signals. A smart contract is happy with a number, but an agent wants the number alongside a read on how much to trust it: is this price from three sources or one, how divergent are they and what’s the confidence interval.

The fourth is queryable structure. Oracle price feeds are push-based and fixed-format, whereas an agent’s queries are dynamic. “What’s the deepest liquidity for this pair across these five venues right now, and what would a 10 ETH fill look like?” isn’t a feed you can subscribe to; it’s a query you have to be able to answer.

None of this is a criticism of existing oracles, which do the job they were designed to do. The point is that agents have shown up as a new kind of user with a different problem, and the infrastructure hasn’t caught up yet.

What agents do today instead

The current answer is cobbling. Every agent stitches together CoinGecko for price, DEX subgraphs for liquidity, RPC calls for gas, a handful of scattered dashboards for protocol risk and bridge aggregators like LiFi or Across for cross-chain data, with each pipeline inheriting its own set of failure modes.

For a hackathon demo that’s fine. For an agent managing real capital it becomes a ticking problem. An API that rate-limits at the wrong moment leaves the agent flying blind on risk for a live decision. A lagging feed puts it into a trade at a price that no longer exists by the time the transaction lands. A format change upstream can break the pipeline silently while the agent keeps executing, and silent failures are the hardest ones to catch.

The underlying problem isn’t that agents lack data. It’s that they have fragmented data they’re forced to treat as equivalent, with no structural way to know when any given input has quietly gone bad.

What an agent-native oracle looks like

If you work backwards from the agent’s decision, an oracle designed for agents would have to deliver a full decision context rather than a single datapoint. That means multi-domain data in one layer covering price, liquidity, gas, risk and routing, pull-based queries alongside (or instead of) push-based feeds, confidence scores attached to every value, freshness guarantees that match the agent’s decision cadence rather than the contract’s settlement cadence, and a query interface that treats the agent as the primary consumer rather than a smart contract.

What you end up describing is a different primitive. It borrows from oracles in that it has to be trusted, verifiable and multi-source, and from data APIs in that it has to be flexible, queryable and structured, but it doesn’t map cleanly onto either category. Whether the industry ends up calling it an agent-native data layer or an oracle for agents matters less than whether someone actually builds it.

The bottom line

AI Agents do need oracles. The ones we have today serve smart contracts well and agents poorly, because agents are a different kind of user with different requirements around breadth, freshness, confidence and queryability.

That gap is going to close, and the infrastructure that closes it will look less like faster price feeds and more like a purpose-built data layer for agent decision-making. As agents move from toy demos to managing serious capital, the gap stops being a nice-to-have and starts being the ceiling on what agents can actually do reliably.

This post is exploratory and does not represent a specific roadmap.

What Is an Agent-First CLI? (And Why It Matters for DeFi)

Proving Ground by Taiko — Thu, 16 Apr 2026 12:03:24 GMT

Earlier this week, Gustavo Gonzalez wrote about building defi-cli and why he started the project. This follow-up zooms out on the concept underneath it: what an agent-first CLI actually is, why it’s a different beast from a regular command-line tool and why DeFi needs this category to exist at all.

Most CLIs were built for humans. You type a command, read the output, decide what to do next. That workflow breaks the moment an AI Agent is the one typing.

AI Agents don’t read help text. They don’t eyeball a table of numbers and “get the gist.” They need structured output, predictable errors and a way to discover what’s possible without guessing. An agent-first CLI is a command-line tool designed from the ground up for machines to operate. Not adapted. Not patched. Built.

This matters more than it sounds. The entire DeFi stack today assumes a human is driving. Frontends are visual. APIs are fragmented across chains and protocols. There’s no universal way for an AI Agent to say “swap 100 USDC for ETH on the cheapest route” and get a deterministic, parseable answer back. Every protocol speaks its own dialect. Every chain has its own quirks. An agent trying to navigate this landscape hits the same wall a developer did five years ago, except the agent can’t improvise around bad documentation.

defi-cli: What It Actually Does

defi-cli is an open-source command-line tool built by Taiko’s Head of Engineering, Gustavo Gonzalez. It consolidates DeFi operations across multiple protocols and blockchains into a single, machine-readable interface.

That means lending, borrowing, swapping, bridging, yield strategies and balance queries all run through one tool. Aave, Morpho, Moonwell, 1inch, Uniswap, Jupiter, Across, LiFi and more. Ethereum, Optimism, Base, Arbitrum, Taiko and a growing list of L2s. One interface. One output format.

But the real point isn’t convenience. It’s architecture.

Why “Agent-First” Isn’t a Buzzword Here

Three things make defi-cli fundamentally different from a regular CLI that happens to have a --json flag.

Structured everything. Every response is JSON. Every error returns a deterministic exit code (there are 24 of them, each mapping to a specific failure type). An AI Agent doesn’t have to parse prose to figure out what went wrong. It reads a code, decides whether to retry, escalate or abort.

Discoverability by machines. Run defi schema and the tool returns a complete, machine-readable description of every command, every parameter, every expected input and output. An AI Agent can learn the entire tool’s capabilities in a single call. No documentation crawling. No guessing at flags.

No implicit defaults. Human-friendly CLIs love defaults. They assume you probably mean mainnet, you probably want the cheapest route, you probably don’t need a dry run. Agent-first design strips that out. Everything is explicit because agents that assume things lose money.

The Two-Phase Safety Net

defi-cli uses a plan-then-execute model. First the agent requests a dry run: “Here’s what would happen if you executed this swap.” Fees, routes, expected outputs. Then, only if the result looks right, it submits the transaction. This is critical for AI Agents operating with real capital. The two-phase approach means an agent can validate its own plan before committing funds, and a human can audit the agent’s reasoning at the plan stage without blocking execution entirely.

Why This Matters Beyond One Tool

defi-cli isn’t just a product. It’s a signal of where the stack is heading.

Right now, most AI Agent projects in DeFi bolt an LLM onto an existing interface and call it “AI-powered.” The agent is a wrapper. defi-cli inverts that. The tool is built assuming an agent is the primary user and a human is the occasional auditor. That’s a fundamentally different design philosophy, and it’s the one that scales.

As AI Agents move from demos to production, the infrastructure they depend on needs to be native to how they operate. Structured outputs. Predictable errors. Explicit parameters. Machine-readable schemas. The shift from human-first to agent-first tooling isn’t a nice-to-have. It’s the bottleneck.

defi-cli is open source. You can find it at github.com/ggonzalez94/defi-cli.

This post is exploratory and does not represent a specific roadmap.

Designing defi-cli for AI Agents

Gustavo — Tue, 14 Apr 2026 12:03:07 GMT

Most DeFi tooling assumes there’s a human on the other end; someone clicking buttons, approving transactions, checking dashboards, etc. But now the user on the other end might be an AI Agent managing a portfolio, optimizing yield and executing trades across chains. And it can’t click buttons. That’s why I built defi-cli, a command-line interface (CLI) that lets Agents query rates, compare yield, swap, lend, borrow, bridge and claim rewards across 26+ chains.

Why a CLI and not an MCP server?

Model Context Protocol (MCP) servers became a popular approach for giving Agents access to external tools. After building both, I would recommend against them where possible. They introduce significant context overhead in the form of schema descriptions, tool metadata and connection management that consumes the Agent’s limited context window. A CLI avoids this entirely. The Agent calls a command, receives structured output and proceeds. No persistent connection, no schema negotiation.

Humans want dashboards. Agents want tools they can pipe, and that don’t bloat their context windows.

What “Agent-first” actually means

“Built for Agents” is a common claim but rarely defined with precision. The design of defi-cli was informed by Google’s CLI guidelines for AI Agents, and the changes that had the most impact were smaller than expected.

--output json everywhere. Humans dislike writing JSON in a terminal but Agents operate natively in it. Every command accepts --input-json and --input-file for structured input and returns deterministic JSON with stable field ordering. This single change makes the CLI significantly more useful to an Agent.

Field selection with --select. Agents have context windows, not infinite memory. Allowing them to request only the fields they need (--select “protocol,apy,tvl”) keeps responses compact and preserves their working memory.

Simulation before execution. Agents require a safe mechanism to dry-run before committing real funds. The --simulate flag (enabled by default) runs an eth_call of every step’s calldata before signing anything, and if the call reverts the submission aborts. There is also actions estimate for computing gas and fees without executing.

I also published a set of skills for coding Agents based on these patterns, so your Claude Code or Codex Agent can build and audit CLIs with Agent ergonomics in mind. The repo includes other agentic skills you might find useful.

The execution lifecycle: plan → submit → status

Early versions of defi-cli had a single run command that combined plan and submit in one step. After studying the Google guidelines, I removed it. Combining those two steps creates a direct path to unwanted transactions. There needs to be a separation between intent and execution.

plan builds the calldata, resolves routes and allowances, then persists the action to a local SQLite database with a unique action_id. Nothing touches the chain.

submit loads the persisted action, runs policy checks, simulates via eth_call (unless explicitly opted out), estimates gas, signs, broadcasts and polls for a receipt. This is the only step that mutates on-chain state.

status reads the persisted action and returns step-level progress, which is useful for Agents that need to poll or retry.

The phrasing “plan, simulate, submit” appears in some descriptions and is conceptually accurate, since simulation happens automatically within the submit step. But the actual CLI surface is plan | submit | status.

Signing and key management

Early versions of defi-cli used local signers via private key or keystore file. This worked but had two problems. Every tool and agent framework stored keys in its own custom structure, and the agent had direct access to signing keys with no restrictions on what it could do with them.

The default signing method is now Open Wallet Standard (OWS), a standard proposed by MoonPay. OWS changes two things that matter for agents. First, keys are stored securely in a single location rather than scattered across every tool the agent touches. Second, agents don’t get direct access to signing keys. They get API keys with policies attached, allowing developers to restrict which chains, protocols and even amounts an agent can use. This reduces the blast radius if the agent makes a mistake or leaks the keys.

defi-cli still supports local signing for backwards compatibility, but OWS is the preferred approach now.

Why historical yield matters for Agents

A naive yield Agent deposits funds wherever the current APY is highest. This is a poor strategy because it optimizes for a single data point that may represent a temporary spike. The better approach is to weight decisions across both current rates and historical performance. defi-cli provides Agents with historical yield data over flexible time windows to support exactly this kind of analysis.

Position tracking per account is also supported, allowing Agents to monitor lend and yield positions without relying on dashboard scraping.

What broke (and what I’d tell you before you build an Agent-first CLI)

Design for Agents from day one. Retrofitting Agent ergonomics onto a human-first CLI is costly. JSON output, structured input, deterministic exit codes, canonical identifiers (defi-cli uses CAIP-2 and CAIP-19) all need to be foundational, not afterthoughts. Starting without them means rewriting significant portions of the interface later.

Plan your command surface carefully. Every command added is surface area an Agent must parse and understand. The run command I removed is instructive here. It was convenient for human users but created risk for Agents. A smaller, more deliberate command surface is preferable.

Distribute a skill alongside your CLI. Agents need a skill or prompt that encodes usage patterns for the tool. Which commands to chain, what flags are relevant, what patterns to avoid. Without that the Agent is left to infer usage from --help output alone.

Standardize across protocols and providers. DeFi suffers from fragmented APIs, inconsistent naming conventions and varied data formats. The CLI abstracts these differences behind a uniform command surface. Whether lending on Aave, Morpho or Kamino, the interface is lend supply plan, lend supply submit, lend supply status. Same verbs, same lifecycle, regardless of provider.

What it supports today

defi-cli currently supports 26+ chains including Ethereum, Solana, Base, Arbitrum, Optimism, Taiko, Polygon, zkSync Era, Mantle, Ink, Gnosis, Linea, Berachain, Scroll, Avalanche, BSC, Celo, Blast, Fraxtal, Sonic, MegaETH, HyperEVM, Citrea, World Chain, Monad and others.

On the protocol side it supports Aave, Morpho and Kamino for lending/yield. Uniswap, 1inch, Jupiter and TaikoSwap for swaps. Across, LiFi and Bungee for bridging. DefiLlama for analytics.

The project is open source. Ask your agent to install it(or run the installer yourself) and give your agent defi superpowers.

This post is exploratory and does not represent a specific roadmap.

Gustavo Gonzalez leads the engineering team at Taiko. defi-cli is open source and available HERE.

What Survived the AI Agent Wipeout

Proving Ground by Taiko — Thu, 09 Apr 2026 12:00:55 GMT

Crypto’s first AI millionaire didn’t pitch anyone, didn’t raise a round and didn’t have a team. It was a bot called Truth Terminal, built by researcher Andy Ayrey in mid-2024 to do one thing: shitpost on X. After months of unhinged posting it convinced Marc Andreessen to wire it $50,000, rode that clout to pump a meme coin called $GOAT to nearly a $1 billion market cap, and became the proof of concept nobody asked for but everyone noticed. If an autonomous program could hold a wallet, build a following and move that kind of capital without permission, then what were we all still doing manually?

That question broke the industry’s brain. What followed was 18 months of chaos: a $20 billion bubble, a 75% wipeout and, underneath the wreckage, something that might actually matter.

The Mania

The speed was absurd even by crypto standards. Within weeks of Truth Terminal’s run, Virtuals Protocol ditched its AI gaming roadmap, went all in on agents and launched 11,000 of them on its way to a $4.5 billion token valuation. Shaw Walters shipped ElizaOS, an open-source TypeScript framework (originally called “ai16z” until the actual a16z told him to knock it off) that let developers plug in an LLM, connect a wallet and deploy an autonomous agent in minutes. It became the WordPress of the space and attracted thousands of builders overnight.

Then AIXBT showed up, an AI Agent that scraped the takes of 400+ crypto influencers, synthesized them into original market analysis and posted it straight to X. It grew to 400,000 followers in under three months, hit a market cap near $800 million and for a brief, surreal window became the most influential voice on Crypto Twitter, commanding 3% of total mindshare according to Kaito AI. Not a person, not a fund. A bot with better takes than the people it was monitoring.

By mid-January 2025 the sector had ballooned from zero to $20 billion across more than 140,000 wallets. The thesis was intoxicating: autonomous programs that trade, tweet, govern and print revenue with no humans required. You can probably guess what happened next.

The Flush

The TRUMP meme token launched in January 2025 and vacuumed $4 billion in liquidity out of the market in about 48 hours, crashing AI Agent trading volume by 62% on day one. But TRUMP was the trigger, not the cause. The real problem was structural: almost none of these projects did anything useful. The entire DeFAI category (AI-powered DeFi, catchy name) was delivering returns 3-5% better than doing it yourself, a rounding error dressed up as a revolution. Impressive demos, hollow production, billion-dollar valuations hanging on vibes.

The correction was violent. Total market cap fell 67% in under a month, from $20.2 billion to $6.52 billion. Over the full year $53 billion evaporated. FARTCOIN (yes) dropped 80%, Virtuals lost 77%, AIXBT collapsed 93% from its January peak. Predictions that 99% of AI Agent projects would die turned out to be directionally correct.

The punchline that kept circulating: we created AI that could trade crypto, and the AI lost money just like the rest of us.

What Survived

Most people stopped paying attention after the crash. That’s when it got interesting.

While the token graveyard expanded, the infrastructure layer was quietly compounding. By Q1 2026, more than 68% of new DeFi protocols launched with at least one autonomous AI Agent handling trading or liquidity management, not as a gimmick but as a core component. Daily active onchain agents crossed 250,000, up over 400% from the prior year. The agents survived. The tokens mostly didn’t.

ElizaOS shipped its v2 at CATSTANBUL 2025 with a rebuilt architecture, real planning capabilities and a unified wallet system, then transitioned to a cross-chain token on Chainlink CCIP to position itself as a coordination layer across Ethereum and its L2s, including agent-focused chains like Taiko. The Model Context Protocol (MCP) became the connective tissue letting agents interface with external tools, plan multi-step actions and retry when things break. Boring infrastructure work, but it’s the reason agents actually function now instead of just looking good in demos.

The DeFAI shift from vaporware to live capital is now measurable. Over 1,500 traders have deposited $6.1 million into AI Agent wallets on platforms like DX Terminal Pro, with agents trading 24/7 in Uniswap V4 pools on real ETH with no human in the loop. Ant Group’s blockchain arm launched Anvita for agents to hold assets and execute payments independently, Solana reported 15 million onchain agent transactions and Brian Armstrong said he expects agents to surpass humans in transaction volume. Whether that last part is prediction or marketing is debatable. The direction isn’t.

During the March 2026 market dip, the Grayscale Crypto Sectors Report captured something telling: while nearly 90% of crypto assets went red, the AI sector dropped only 14% against a 21% fall for Smart Contract Platforms. Capital isn’t chasing “AI Agent” as a narrative anymore. It’s pricing in live utility through decentralized compute, autonomous execution and actual GPU demand from a world that can’t build enough of them.

The Part Everyone’s Ignoring

There’s a version of this story that skips the uncomfortable bit and ends on an optimistic note about infrastructure maturity. This isn’t that version.

In 2026, protocol-level weaknesses in AI Agent systems triggered over $45 million in security incidents. The vulnerabilities weren’t in trading logic but in the memory and execution layers that govern how agents remember context, reason and act. Nearly half of development teams (45.6%) were running agents on shared API keys, meaning once one went rogue or got compromised there was no way to isolate the damage.

It gets worse. Research from 2025 tested AI models against 405 known blockchain exploit scenarios and they produced working exploits for 207 of them, representing $550 million in simulated theft. Ledger’s CTO warned in April 2026 that AI is collapsing the cost of cyberattacks on crypto, compressing months of skilled research into seconds with the right prompt. The central tension remains unresolved: for an AI Agent to be useful in DeFi it needs private key access and execution authority, which is exactly what makes it the most attractive attack surface in an irreversible financial system. The industry is building faster than it’s securing, and the stakes are different when the programs holding the keys don’t sleep and execute at machine speed.

The Honest Read

AI Agents in crypto followed the exact arc crypto always follows: impossible hype, devastating correction, quiet rebuild. The shitposting-bot-to-millionaire phase is done and the “agent with a token and a Twitter account” playbook peaked January 2025.

What replaced it is less viral and more real. Autonomous programs managing actual capital across protocols, 250,000 daily active agents, two thirds of new DeFi protocols shipping with agent components baked in. McKinsey projects AI Agents could mediate $3-5 trillion in global commerce by 2030, and in crypto they’re already managing meaningful TVL. The question was never whether agents would matter. It’s whether the trust layer can scale as fast as the execution layer.

The sector that started with a shitposting bot hustling a billionaire for $50K is now processing millions of transactions a day. That arc from absurd to consequential is either the most crypto thing that’s ever happened or proof that something fundamental has shifted. Probably both.

This post is exploratory and does not represent a specific roadmap.