I made three agents fight over my marketing copy
Autoreason is a self-refinement loop with a clever halt condition. It halted on my paragraph. I kept running it anyway.
Autoreason has been on my “try this” list for weeks. I’d been putting it off. Anything labelled “multi-agent reasoning loop” sounds like it needs a PhD, and I’m still warming up to running my own agent stuff rather than reading about it. It was easier than I’d built it up to be, and more fun, mostly because the agents have personality. One was openly sassy about my draft.
I ran it on ninety words of pitch copy I couldn’t finish. Every version felt nearly right and not quite right. I’d ask Claude to tighten it, the new version would land different but not clearly better, and after the third pass I couldn’t tell whether I was improving it or smoothing it into something else. This is the failure mode of LLM editing on subjective work. The model agrees with you. It also agrees with the next person who asks it the opposite question. There’s no scoreboard, no halt condition, no version of “actually, the original was fine” living anywhere in the loop.
Autoreason is built for this. It’s a method published this year by SHL0MS and Hermes Agent at Nous Research, extending Karpathy-style autoresearch into subjective domains.
How the loop works
Each round produces three versions. Version A is the unchanged incumbent. Version B is an adversarial revision, written by an agent told to find what’s wrong and fix it. Version AB is a synthesis, given both A and B and asked to produce the best of each.
A panel of fresh judges with no shared context ranks them. They don’t know which is which. The ranks aggregate by Borda count: every position contributes, so close finishes don’t reduce to coin flips. Winner becomes the new incumbent. Loop runs again.
The halt condition is the part I keep coming back to. The loop stops when the unchanged version wins twice in a row. “Do nothing” is a first-class participant in every round, not a fallback when the agents give up.
What happened on my paragraph
Round one. The adversarial rewrite won. It killed two qualifiers I hadn’t noticed I was leaning on, and moved the strongest sentence to the front. The adversary was sassy. There’s no other word for it. Where Claude in editing mode is endlessly diplomatic, the autoreason adversary has been told her job is to find what’s wrong, and she leans in. The synthesis was worse than B alone. It had tried to keep some of A’s structure and ended up muddled.
Round two. New incumbent, new everything. Synthesis won. It had picked up a turn of phrase from the new incumbent and grafted it onto a sharper opening line the adversary cooked up. This was the round that taught me something. The synthesis isn’t a default winner. It only wins when it has something real to combine.
Round three. Incumbent won. First “do nothing” win.
Round four. Adversary tried to make the verbs more active. Judges weren’t convinced. Incumbent won again. Two in a row. Halt.
Whole thing took about eleven minutes. Roughly forty agent calls.
What it fixed
What autoreason fixed wasn’t the copy. It fixed knowing when a tournament was done.
Standard self-refinement loops reward action. You ask the model to improve something, the model improves something. The model isn’t going to come back and say “actually, version four was better than version five, ship that and walk away.” It would feel rude. It would also be the right answer half the time.
Autoreason is the first edit loop I’ve used with a halt condition. Not “the agents got tired,” but two independent panels of fresh judges agreeing the unchanged version is already winning. The judges have no idea which version came from where, and they’re spawned fresh each round so they can’t learn to favour their previous picks. No agent has an incentive to keep the loop running.
Keep the brief short
If you try this, keep the brief short. My first run had a long, detailed brief: audience, campaign context, the seven things the paragraph had to do. The agents got into the weeds and stayed there. The synthesis read like a checklist.
The brief is a constraint, not a wishlist. Every line narrows the search space. Add too many and you’re not running a tournament, you’re running a compliance check.
The bit I keep thinking about
The most interesting design choice isn’t the synthesis or the Borda count. It’s that “do nothing” is on the ballot every round.
The default behaviour of every LLM tool I’ve used is “produce something.” Autoreason builds in a way for the loop to say no, and a way for “no” to win, at least within a single tournament. The first time the incumbent won twice in a row, I argued with the result for a minute before I noticed I was arguing with a vote, not a model trying to please me.
The way I broke it
What I described is autoreason working the way it’s supposed to. What happened on my paragraph is that the loop halted clean, on a paragraph that wasn’t the one I’d set out to write.
That’s not autoreason’s fault, it’s mine. Every time it halted, I’d sharpen the brief and re-run. The tournament got tighter each pass. The winning paragraph drifted further from what I’d been trying to say.
Autoreason leans into a hole of perfectionism if you let it, and I let it. The “do nothing” safeguard works on a single round. It doesn’t protect you from yourself across many.
I didn’t ship the paragraph in the end. Not the one the loop halted on, not the one I started with. We’d gone too deep into the hole.
Give Autoreason a try HERE
This post is exploratory and does not represent a specific roadmap.




