Your articles have been devoured by AI.
This isn’t metaphor. This is fact. OpenAI crawled most of the internet’s content to train GPT. Google, Meta, Microsoft—they’re all doing the same thing. Your GitHub repos, your blog, your social media posts are all sitting in training datasets in some data center.
The question is: is this legal?
The answer depends on where you are.
Countries worldwide have vastly different legal attitudes toward web crawling and AI training across four dimensions: “copyright protection,” “technological innovation,” “commercial accommodation,” and “TDM/AI exemptions.” If you plotted them on a radar chart, some countries look like open palms, others like clenched fists.
Japan is the most radical globally. Its 2018 revision of Article 30-4 of the Copyright Act basically says: for any purpose, crawling and using copyrighted materials for “computer information analysis” is permitted. AI research? Yes. Commercial product training? Also yes. Japan deliberately positioned itself as an AI-friendly legal environment, watching American innovation and EU regulation, and chose the most permissive path.
Singapore follows closely. Its 2021 copyright law revision explicitly includes commercial “computational data analysis” exceptions, making it a crucial hub for AI innovation in the Asia-Pacific region.
The US takes a different route—not through broad statutory exemptions, but through judicial precedent. The core concept is called “transformative fair use”: if your usage fundamentally changes the purpose and meaning of the original content, it might not constitute infringement. Google Books scanned millions of books, copyright holders sued, and courts ruled: this is transformative, therefore legal. AI training follows the same logic—your article was taken, but it’s not selling your article; it transformed it into statistical weights, into a system that can generate new text.
America’s position: most permissive. Innovation first. Let the market speak.
The EU passed the “Digital Single Market Copyright Directive” in 2019, with a fundamentally different design: TDM is allowed, but with a massive “but”—rights holders can opt out. If Le Monde or Der Spiegel says “don’t crawl us,” crawlers must stop.
This framework’s logic: innovation matters, but creators’ rights matter too, so provide an option. In practice? Large publishers have the capacity to establish opt-out mechanisms; small media and individual creators often lack bargaining power. The result is more market concentration, where big winners win even more.
The UK currently only has limited TDM exceptions for “non-commercial research,” with policy still wavering. Overall attitude is more conservative than the EU.
Then there’s Taiwan.
Taiwan lacks a clear TDM legal framework. Copyright law exists, but “for what purposes data crawling is permitted” isn’t clearly defined. In 2024’s Lawsnote case, a legal database platform faced strict judgment for crawling content. The message was clear: crawling carries risks, you might get sued.
The result is a chilling effect. Taiwanese companies wanting to do AI innovation are uncertain about their legality. Taiwanese creators wanting to protect content must sue case by case themselves. This isn’t “balance”—it’s guidanceless void.
Taiwan’s silence equals surrender. Because international companies follow the laws most favorable to them—usually US law. Your content gets crawled by American companies under American law, while Taiwanese law neither protects you nor empowers you.
I’m in this predicament myself. Every article I write on paulkuo.tw could potentially be used to train some model. I know this happens. I cannot stop it. If I lived in the EU, I could request Googlebot to stop crawling. If I lived in Japan, I’d at least know what the rules are. But I’m in Taiwan, where law gives me no tools, only vague threats.
This isn’t a personal problem. This is systemic inequality.
TDM—Text and Data Mining—is the technology of using programs to automatically analyze large amounts of digital data, aiming to identify patterns and knowledge from unstructured text. It’s the foundational technology for training large language models and generative AI. AI exemptions are legal exceptions that allow AI training processes without obtaining copyright permission one by one. Countries differ in: how broad these exceptions are, and whether creators have the right to opt out.
There are only three paths forward. First, Taiwan could follow the EU model, establishing a framework where creators can opt out—but this requires legislative will. Second, Taiwan could follow Japan’s lead and open everything up—but this sacrifices creators. Third, maintain the current void—and the cost of void is always borne by those with the least bargaining power.
The current question isn’t whether AI can use your content. Of course it can. The question is whether you have any voice in the matter.
In America, markets and courts speak. In the EU, regulation speaks. In Japan, national policy speaks.
In Taiwan, silence speaks. And the cost of silence is paid by every creator.
Perhaps it’s time to make this matter no longer silent.
💬 Comments
Loading...