Why do more automations make systems more fragile?

Because the failure mode changes. When you have fewer components, failures usually stem from the components themselves breaking—fix them and you're done. With more automation, the primary failure source becomes 'every component working correctly but unaware of others' existence.' The six issues I encountered weren't bugs: SEO bot, translation bot, cache, sync scripts were all doing exactly what they should, but their correct behaviors stacked up to create mirror misalignment, human translations being overwritten, homepage showing stale versions. Fragility hides in the gaps between components, not within the components themselves.

What is an 'automation contract'? What specifically should it contain?

It's documenting each automation's trigger conditions, behaviors, side effects, and exit mechanisms in a shared document that all participants (human and AI) will read. Minimum four items: when it activates (after push? commit message contains keywords?), what it touches (modifies files? adds commits? clears cache?), how to opt out this time (like [skip-translate] flags), how to verify it ran when things go wrong (where are the logs). My version is written in the repo's CLAUDE.md and writing skills—every AI window loads it before starting work.

Will AI cause people to unconsciously lose judgment capability?

The risk isn't AI replacing humans, but humans gradually abandoning their judgment position for efficiency. Each prompt we throw out actually hands over a portion of judgment sovereignty. This isn't inherently bad—division of labor is civilization's engine. The key is knowing what you've handed over and what you've retained: which decisions go to the system, which must return to human oversight. My approach is to explicitly write this boundary into specifications that every AI window loads at startup, making 'handing over' a visible action rather than silent erosion.

繁简 EN 日

Intelligence & Order

All My Automation Was Working Perfectly, Then They Ganged Up and Bit Me Back

Resilience Engineering × Multi-Agent Collaboration × Release Pipeline: A postmortem of one article triggering six system issues, and reflections on judgment sovereignty, agency, and the civilization of work

Paul Kuo 郭曜郎 June 2026 13 min read AI-translated from the Chinese original

TL;DR: All I wanted was to publish one article. It took three hours and surfaced a cascade of system issues: bot commits leaving three mirrors misaligned, auto-translation overwriting human translation, homepage cache blocking the new article, specification drift between different AI windows. Most ironically, every component was working “correctly.” What truly needed fixing wasn’t dismantling the automation, but giving each automation a contract readable by all participants: trigger conditions, side effects, opt-out flags, verification entry points—all four essential. And behind the contract lies a more fundamental question: the future risk of work isn’t automation breaking, but all of it working correctly while no one knows how the pieces are affecting each other.

▶ Listen to summary

AI-synthesized voice, cloned from the author's own voice

Last night, I published an article.

Four language versions, files staged in ten minutes. Cover image generated, confirmed, finalized—smooth sailing. Then I hit push. The next three hours were spent battling my own system.

Not battling bugs. In those three hours, I didn’t fix a single broken thing. The SEO bot correctly indexed new URLs, the translation bot correctly translated articles, cache correctly cached pages, scheduled scripts correctly logged data. Every component, examined individually, was doing exactly what it should.

But together, they ganged up and bit me back.

I originally just wanted to test a new publishing workflow, but ended up discovering I was actually standing at the threshold of a fundamental transformation in how we work.

Testing Tools, Encountering a New Work Order

Over this period, I’ve been trying to integrate several AI collaboration tools into my workflow. On the surface it was just testing tools, but what I actually encountered was a deeper problem: when requirements, data, interfaces, testing, and corrections all begin involving AI intervention, what humans truly need to manage is no longer individual tasks, but an entire new order of work.

Initially, I thought AI was just helping me fill in code snippets, make commits, or organize repetitive work. But as I continued, I began to realize AI wasn’t simply accelerating existing processes—it was forcing me to reconceptualize “the process itself.”

The following six pitfalls were the tuition for this lesson. Four of them are the most representative and worth dissecting one by one; the other two are more operational, and in the end they fold into the same problem: the automation was never written into a contract that all participants could read.

Why Do Three Mirrors Never Align After Push?

First, some context. This website went through an unexpected GitHub account suspension, after which I spread the repo across three mirrors: Codeberg, GitLab, GitHub as equal peers, with local git as the single source of truth. The day before, GitHub was restored and the three-mirror plus local deployment system had just reconnected. It was only when publishing this article that I truly experienced this new system’s temperament.

After pushing the article, GitHub’s automation kicked in: one bot registered the four-language URLs in the SEO index list, adding a commit; another bot generated social media materials, adding another commit; then an automatic merge. The result: every time I pushed, GitHub ended up one or two commits ahead of the other mirrors.

My first instinct was to level them out. Wrong move. Force-pushing overwrote the bot’s commits, then next round it added them again, so I’d overwrite again, it’d add again—classic whack-a-mole. After two rounds I stopped to think: the bot wasn’t interference, it was working for me, its commits were work products. The correct approach was fetch plus fast-forward to absorb its results locally, then bring the other two mirrors up to speed.

The essence of this pitfall: I built a system that grows its own commits, but my muscle memory was still stuck in the “only I touch remote” era.

Why Did Auto-Translation Overwrite Human Translation?

The second bite back went deeper.

The English, Japanese, and Simplified Chinese versions of this article were manually polished beforehand: terminology, tone, localization all adjusted. I staged all four files and pushed. Minutes later, GitHub’s auto-translation pipeline woke up, found no entry for this new article in the translation records, and faithfully executed its task: re-translate the three language versions, overwriting the files I’d uploaded.

More interesting was the quality direction. Comparing afterward revealed the machine re-translation actually regressed on localization: the human version wrote “进化生物学” and “集群”—mainland Chinese readers’ common usage; the machine version wrote “演化生物学” and “丛集”—Taiwanese terminology directly converted. The automation didn’t just overwrite human work, it overwrote it with an inferior version.

The fix itself wasn’t hard: restore files, add [skip-translate] flag to commit message, align translation record hashes to let the pipeline know “this one’s already translated, don’t touch.” The real lesson was earlier: this pipeline had an opt-out mechanism, but only the pipeline itself knew it existed. My operating manual (and that of every AI window working for me) had no mention of this.

Article Went Live, Why Doesn’t the Homepage Know?

The third bite came from caching.

All four language versions of the article page returned 200, titles correct, covers normal. I thought I was done. Then I noticed the homepage’s “Latest Thinking” section was still stuck on the previous article.

The debugging process was like peeling an onion. First suspected CDN cache, used Cloudflare’s API to purge the homepage URL—succeeded, no effect. Checked response headers: cf-cache-status: DYNAMIC—it wasn’t even the CDN layer caching. Finally found the website’s own middleware: it uses Workers Cache API to cache each page’s HTML at edge nodes for twenty-four hours, and the cache key included a version number suffix. Version number unchanged, cache unchanged; because the key looked different from the URL, even zone purge couldn’t match it.

The person who designed this mechanism—me from a few weeks ago (through some AI window)—wrote clearly in code comments: “bump version number after content updates to clear cache.” But this line only lived in code comments, never entered the publishing process checklist. So between “deploy successful” and “readers can see it” lay a wall no one could remember.

Why Does Each AI Window Get Different Specifications?

After fixing the first three pitfalls, I thought it was over. Until I looked back at “why these pitfalls existed” and dug into the fourth layer: specification drift.

My work method involves multiple parallel AI windows: some handle writing, some handle engineering, some handle reconnaissance, each loading the same writing specification as a startup prerequisite. I’ve written about this governance architecture of one person managing four AI windows before. When a product begins real operation, the question is no longer just “can it be built” but how different tasks are decomposed, delivered, retrieved, and re-judged. Some AIs handle documents, some handle interfaces, some handle testing and feedback; humans retreat to higher positions, becoming architects and judges.

The problem: that specification had multiple copies. The master in the repo, mirrored documents, plus copies for desktop application loading. When the master upgraded, copies didn’t necessarily follow; worse, the document claimed “copies auto-sync,” but looking at the actual git hooks, that auto-sync line was written but never connected.

In other words: the specification document itself committed the error it aimed to prevent. It described a non-existent automation, and every AI window reading it assumed sync would happen automatically.

The relationship between map and territory completely inverted here. It wasn’t the map falling behind territory—the map declared what territory looked like, then everyone followed the map into bridges that didn’t exist in the territory.

In a human team, an outdated document does limited damage: a new hire follows it, hits a wall, asks a question, someone corrects it, and the error stops there. But in an AI-collaborative workflow, documents are the agent’s basis for action. A wrong document is not a static error—it becomes the shared source for the next round of automated action. Every window acts on the same wrong map, no one hits a wall and turns around to ask, and the error doesn’t get diluted—it gets faithfully executed in sync. This is no longer a document management problem; it’s a knowledge governance problem of the AI collaboration era: when agents act on outdated specifications, the document itself is a system risk.

Resilience Lesson Two: Automation Needs Contracts

My first resilience engineering piece came from GitHub suspension teaching me: don’t let single platforms control your lifeline. That was resilience lesson one: redundancy.

This second lesson was completely different. The attacker wasn’t a platform—it was what I built myself. None of the six pitfalls came from external dependencies; all came from “I built automation, then forgot to tell all participants.” Participants included future me and every AI window working for me. Those two operational pitfalls I didn’t expand on earlier? Taken apart, they come down to the same sentence: someone changed the system’s behavior, and no one updated the explanation that all participants read.

That’s when I truly felt that AI brings not tool upgrades, but reorganization of “division of labor logic.” We used to ask: who does this? Now we ask: which part goes to AI? Which part still requires human judgment? Which part needs retained sovereignty?

The fix list was actually short. Give each automation a contract, written into shared documents that all windows load at startup, containing four items:

When it activates: After push? Commit message contains specific words? Every ten minutes? What it touches: Which files it modifies, what commits it adds, which cache layers it clears. How to opt out this time: Flags like [skip-translate]—if none exist, add them. Where to verify afterward: Log paths, version numbers, manifests—able to confirm within three minutes whether it ran when things go wrong.

Together, these four items form a minimum viable automation contract. It’s not a personal note written after an incident; it’s a piece of workflow governance any team can adopt as-is, paired with two enforcement disciplines: no new automation goes into the pipeline until its contract is fully written, and whenever a contract changes, the documents that all participants (human and AI) load at startup must be updated in sync. Team size is not the threshold. As long as a system has two or more participants touching the same state, this contract should exist.

When platforms like Google begin integrating postmortem documents, AI collaboration processes, and development systems, what we see isn’t individual product features but a new organizational form taking shape. AI is starting to enter not just assist engineers, but organizational memory, repair, feedback, and decision loops. One-person companies just compress this trend to its limit: no “other person on call” to blame, all pitfalls are self-dug, all fixes directly become next window’s startup specifications. Every lesson you write gets loaded, executed, validated by some window the next day. The ROI of postmortems is actually higher than for teams.

Future team boundaries might no longer be determined solely by staffing, but jointly constituted by people, models, data, processes, and permissions. Products are no longer just developed objects, but systems that continuously learn, correct, and evolve.

Handing Over Tasks, or Judgment Sovereignty?

Contracts solve the “invisible” problem. But these three hours left a deeper problem that contracts can’t solve.

When we throw out a prompt, it seems we’re just issuing a command, but we’re actually handing over part of our judgment sovereignty to the system. What’s truly worth considering isn’t whether AI can complete tasks, but whether we know what we’ve handed over.

The biggest reminder from this implementation: AI doesn’t just complete work for humans—it’s entering the front end of human thinking. It doesn’t just respond to requirements; it’s starting to influence how we describe problems, divide tasks, set priorities, and understand how products should be completed.

Therefore, the real question might not be whether AI will replace humans, but whether humans will gradually abandon their judgment position in pursuit of efficiency. As tools become increasingly organ-like, we must reconfirm: does this organ serve me, or am I becoming an extension of some system?

Here I want to offer a concrete criterion, not just a metaphor: when a system begins to trigger itself, rewrite state on its own, and influence the next round of decisions, it is no longer merely a tool—it has entered the scope of governance. A hammer doesn’t start working at midnight on its own, but a system that translates by itself, appends its own commits, and decides what the homepage should look like does. The former only needs maintenance; the latter needs contracts, permissions, and accountability. Where you draw this line determines your stance toward the system: on this side of the line you are a user; on the other side, a governor.

This returns to the question I wrote about just the day before: Is AI just an external tool, or is it becoming an organ in our working organism? If it’s just a tool, humans remain the subject; but if it becomes an organ, we must ask clearly who controls this organism.

So this appears to be a technology issue on the surface, but is fundamentally a governance issue. AI’s key lies not just in model capabilities, but in how permissions, data, processes, responsibility, and agency are rearranged.

What Bites Back Is Alive

After three hours of chaos ended, I looked again at this system: repos that grow their own commits, pipelines that translate themselves, schedulers that keep their own books, caches that remember every page at edge nodes.

It’s indeed becoming more like a living thing. And that’s how living things are: they don’t wait for your commands to act, they have their own rhythms, their own reflexes, their own metabolism. You feed it, expand it, and someday its behavior exceeds your map, then on some late night, it gives you a bite back, reminding you: time to update the map.

I’ll continue exploring this direction, because I’m increasingly certain this isn’t simply an efficiency revolution, but a rewriting of work civilization. What truly matters isn’t how many AI tools we use, but whether we can still maintain human agency, judgment, and governance capability in an age when AI gradually becomes organs.

In the end, the future risk of work isn’t automation breaking, but all of it working correctly while no one knows how the pieces are affecting each other. Broken things throw errors, leave traces, force you to stop and fix them; correctly working things don’t. They just quietly stack up, until some late night they all surface at once.

Taking a bite back isn’t failure; it’s an inevitable part of how a system evolves. Errors will keep appearing, and that’s nothing to fear. Laozi was right: “Misfortune is what fortune leans upon; fortune is where misfortune hides.” What truly matters is that every error leaves a trace and drives a correction, making the map more accurate, the contracts clearer, and each iteration more mature than the last.

Derived from 4 sources

AI Agents vs. Agentic AI：從任務工具到能動夥伴的演化
Paul Kuo 從建立三套多代理系統的實戰經驗出發，系統拆解 AI Agent 與 Agentic AI 的設計哲學本質差異。AI Agent 是明確任務導向…
多模型實作：讓 Claude 與 Gemini 聯手，把網站重構成可被人讀也可被 AI 讀
Paul 實踐 Build for Models 和 Agentic Web 概念，以 Claude 與 Gemini 多模型協作重構個人網站 paulkuo.…
知識管理不靠自律，靠管線
Paul 主張知識管理的真正瓶頸不在收集，而在於分類與檢索。傳統依賴自律的手動整理方式容易失效，他提出透過 API + cron + AI Skill 建立自動…
不會寫程式的人，12 天寫了 23,000 行程式碼
Paul 以自身親歷案例論證：一個有判斷力的非工程師，配合 AI 作為全端協作者，在 12 天內產出 23,000 行程式碼，相當於 3.5 人團隊的工程量，成…

Explore Collisions ↗

Frequently Asked Questions

Why do more automations make systems more fragile?: Because the failure mode changes. When you have fewer components, failures usually stem from the components themselves breaking—fix them and you're done. With more automation, the primary failure source becomes 'every component working correctly but unaware of others' existence.' The six issues I encountered weren't bugs: SEO bot, translation bot, cache, sync scripts were all doing exactly what they should, but their correct behaviors stacked up to create mirror misalignment, human translations being overwritten, homepage showing stale versions. Fragility hides in the gaps between components, not within the components themselves.
What is an 'automation contract'? What specifically should it contain?: It's documenting each automation's trigger conditions, behaviors, side effects, and exit mechanisms in a shared document that all participants (human and AI) will read. Minimum four items: when it activates (after push? commit message contains keywords?), what it touches (modifies files? adds commits? clears cache?), how to opt out this time (like [skip-translate] flags), how to verify it ran when things go wrong (where are the logs). My version is written in the repo's CLAUDE.md and writing skills—every AI window loads it before starting work.
Will AI cause people to unconsciously lose judgment capability?: The risk isn't AI replacing humans, but humans gradually abandoning their judgment position for efficiency. Each prompt we throw out actually hands over a portion of judgment sovereignty. This isn't inherently bad—division of labor is civilization's engine. The key is knowing what you've handed over and what you've retained: which decisions go to the system, which must return to human oversight. My approach is to explicitly write this boundary into specifications that every AI window loads at startup, making 'handing over' a visible action rather than silent erosion.

💬 Comments

← All articles