TL;DR Ultracode binds deep reasoning and automatic team orchestration into a single switch. It lets AI assess tasks independently, decompose them into steps, and even replicate itself into tens or hundreds of parallel agents working simultaneously. When machines have learned how to divide labor, human value is pushed toward two more fundamental questions: Is this worth the compute? And once it’s done, what makes the result count?

Not long ago, I opened Claude Code and spotted a new option at the bottom of the menu: Ultracode. I switched it on, handed it a tedious code cleanup task, and then did nothing — just watched the screen to see how it worked.

It didn’t behave like a conventional AI that charges forward the moment it receives an instruction. It took time first to compare several diverging version histories, confirmed there were no file conflicts, and only then calmly selected the safest merge path. It synced the mirror, wrote the history record, and filled in the day’s work log. “Almost done thinking” kept flashing on the screen. By the time it finished, what struck me wasn’t how elegantly it had performed — it was that the shape of how it worked had changed. It operated with more caution than most engineers would.

Caption: I handed Ultracode a complex version-cleanup task and did not intervene at any point. It independently compared versions, selected merge logic, updated the mirror, and completed the log.

Beneath the Surface: Deep Reasoning and Dynamic Division of Labor

Let me clarify one thing first: Ultracode is not a new model, nor is it simply making AI “think longer.” It is an operating mode within Claude Code, released alongside Opus 4.8 in late May 2026 (official documentation here). Enabling the switch turns two gears simultaneously:

  1. Maximum reasoning intensity (xhigh): Forces the model to mentally simulate all potential risks and architectural boundaries before touching a single line of code.
  2. Automatic dynamic orchestration (dynamic workflow): The AI assesses the scale of the task itself, determines whether it warrants decomposition, and distributes the work accordingly.

The second point is what matters most — and it’s conditional. Only when a task is large enough and can actually be split does it spin up sub-agents. If the task is inherently linear — like the version cleanup I gave it, where a single rebase cannot be handed to ten agents simultaneously — it simply works through the steps one by one. The same switch, encountering different problems, automatically grows different strategies. That was the first signal I read from the screen: Ultracode was on, but it wasn’t using the capability just to use it.

When the Task Is Large Enough, AI Assembles Its Own Agent Team

When the task is sufficiently large, Ultracode reveals its real force. It generates a script on the fly within the same session, spinning up tens or hundreds of sub-agents, each one tackling a distinct portion of the codebase.

What makes the validation approach even more elegant is adversarial verification. One group of agents attacks the problem from multiple angles; another group is dedicated to challenging and refuting the conclusions the first group reached. The two sides argue within the system, back and forth, until the answer converges and no further vulnerabilities can be surfaced.

This is not laboratory theory. It has already happened.

📊 Field Data: Bun’s Language Migration

Bun (the well-known JavaScript runtime) author Jarred Sumner shared an extreme case: using this mechanism, he migrated nearly one million lines of Bun’s core codebase from Zig to Rust.

ItemPublic Data
Engineering scaleApproximately 960,000 lines of source code, over 6,000 commits
Development timeLess than 10 days from start to merge into main
Quality verificationHundreds of AI agents working in parallel, two AI reviewers per file, final test pass rate of 99.8%

In the past, this would have been a workload that an entire team of senior engineers might spend multiple quarters on — with no guarantee of finishing. With this mechanism, a mergeable result was produced within ten days. (Sumner also noted this was more of an experiment and may not replace the existing Zig version.)

What I Thought Was My Core Advantage Has Become a Built-In Feature

Seeing this capability, I had complicated feelings — because I had only just finished building my own version of “multi-agent collaboration” the hard way.

Not long ago — really just two months ago — I had painstakingly structured my AI workflow to approximate team-style collaboration across different windows. I divided the process into clearly defined roles: Chat handled search and strategic judgment, Cowork handled synthesis and execution, and Code handled programming and technical verification. Between each stage, I used documents and memory systems to synchronize state, preventing information fragmentation, duplication, or conflict.

This kind of division of labor was, in essence, harness engineering: not simply throwing problems at AI, but designing a system to constrain, guide, distribute, hand off, and validate AI’s work.

I had thought this was my moat as an independent worker.

Ultracode filled it in completely. The sophisticated collaboration techniques that once required careful human planning, manual isolation, and cautious context-switching across windows have been reduced to an ordinary button at the software layer. This signals one thing clearly: the technical premium for manually orchestrating AI has been zeroed out.

When Execution Becomes Free, What Remains Scarce?

When the tool takes over the most cognitively demanding work — dividing labor and orchestrating agents — and the cost is open-ended (no cap; it runs until the answer stabilizes), the human role is pushed back one step.

You no longer need to think about how to decompose and delegate, because the machine does it faster and better. What genuinely tests a person at that point is two things that cannot be automated:

  • Compute judgment: Does the depth this problem requires actually warrant pressing the switch and letting this swarm of agents burn through an unpredictable amount of compute until they converge?
  • Output taste: When hundreds of agents deliver a logically rigorous, architecturally substantial artifact, what gives you the standing to trust it? And how do you define its scope and acceptance criteria?

Acceptance criteria are, at their core, an expression of taste. You have to already know deeply what “good” means before you can look at the tens of thousands of lines the machine produces and immediately sense what’s off, what belongs, and what should be cut.

In my piece Beyond Man-Days, I argued that as AI lowers the barrier to execution, output shifts from “how much time did you invest” toward “how well did you define the problem, allocate the tasks, and maintain quality control” — humans move from operators to project managers. Ultracode is one step further down that path: even the act of allocating tasks is something the tool now handles on its own. What remains in hand is more purely judgment. It hasn’t reduced the amount of judgment I exercise — it has relocated where judgment lives, moving it from the details of execution to two prior questions: should this even be triggered, and once it runs, what justifies trusting the result. Neither question can be automated, because both are fundamentally about value trade-offs, not computation.

Closing: The Real Moat

Watching Ultracode run through that cleanup on my screen, the question that surfaced wasn’t “will humans be replaced?” It was this: when machines take over orchestration, execution, debugging, and can now directly complete large-scale code migrations — what capabilities should humans keep in their own hands?

The answer is probably not faster operation, or more skilled execution, because those capabilities are being absorbed into automation at an accelerating rate. Today it can help handle nearly a million lines of code; tomorrow the scope of what it can take on will only expand. Individuals and teams who once prided themselves on execution advantage need to reckon honestly with this new reality: the importance of pure execution capability will keep getting compressed.

But this isn’t necessarily bad news. It forces us to distinguish more clearly: what is merely busyness, and what is genuinely valuable capability.

What I think remains, in the end, is judgment and taste.

Judgment is knowing what’s worth doing and what isn’t. Taste is knowing what it means to do something well enough that it’s actually done right. Neither of these grows automatically from tool upgrades, and neither can be instantly acquired by flipping a switch. They come from real experience — from mistakes made, from bad designs witnessed, from hard trade-offs navigated, and from a capacity for discernment built up over long stretches of time.

Machines will run fast, and they will keep doing more. But in the end, there still needs to be someone who can affirm: this was worth doing, and it was done right. That capacity is the human moat.