Stealing the mind of a machine

AI distillation and the law: Why learning from Claude or GPT may not be copyright infringement

March 16, 2026

Scales of justice and AI bubble with a judge's gavel. — Image: iStock/Getty Images Plus

American AI company Anthropic publicly alleges that several Chinese firms, including DeepSeek, Moonshot AI and MiniMax, extracted capabilities from its Claude model through millions of carefully structured queries. It’s done neither by hacking, nor by stealing code, but by asking questions – over and over again, at a large scale.

The accusation sounds dramatic – “industrial-scale distillation”.

Distillation is a method in which a smaller AI model learns by studying the answers produced by a larger, more powerful AI model, so it can perform similarly but more efficiently. In other words, the “student” model learns from the “teacher” model’s responses, rather than from raw data alone.

Does it sound like theft? But pause for a moment and ask yourself: When a model learns from another model’s answers, what exactly has been stolen?

That question is where the law begins to wobble.

Is this piracy, or something else?

We instinctively understand piracy. It’s the familiar act of copying a film, downloading music without permission, or duplicating someone else’s code. The law was built for that world, a world of tangible copying and visible duplication.

But distillation doesn’t copy files. It doesn’t replicate source code. It doesn’t extract model weights. It watches outputs and learns patterns.

Imagine a brilliant professor giving public lectures. A student attends every session, takes meticulous notes and later builds a condensed course based on what they learned. Did the student steal the professor’s brain? Or did they learn?

Distillation sits in that uncomfortable space between imitation and appropriation. It’s not duplication, it’s behavioural emulation at scale. And that distinction matters more than most headlines admit.

Man with a suit studying a book with a magnifying glass. — Image: iStock/Getty Images Plus

What about copyright?

Let’s move to the obvious legal tool – copyright.

Copyright protects original works of human authorship. That principle was reinforced decades ago in Feist Publications, Inc. v. Rural Telephone Service Co., where the US Supreme Court made clear that facts are not protected, only original expression.

Now consider AI outputs. Are they human-authored works?

Most copyright offices around the world have repeatedly said no. Courts have echoed that view. In Thaler v. Perlmutter, the court confirmed that copyright requires a human author.

Supreme Court Denies Certiorari in Thaler v. Perlmutter: AI Cannot Be an Author Under the Copyright Act https://t.co/Owcw6wKDsh | by @Baker_Donelson
— IP Law News (@IPLawAlerts) March 11, 2026

So here’s the uncomfortable answer to the earlier question:

If AI-generated outputs are not copyrightable in the first place, then using those outputs to train another model does not infringe copyright, at least under traditional doctrine.

Distillation copies behaviour, not text. And copyright does not protect ideas, systems, methods of operation, or functional processes.

In Lotus Development Corp. v. Borland International, a court held that even a software menu structure was merely a “method of operation”, not protectable expression.

A model’s reasoning structure looks far more like a method of operation than a novel.

So ask yourself: If a student model learns to “reason like” Claude or GPT, is that copying expression or learning a system?

Under the current legal regime of copyright, it looks like the latter.

Now, let’s confront the emotional truth. Frontier AI labs spend hundreds of millions, sometimes billions of dollars, training these models. If a rival can reproduce 80% or 90% of the performance by querying an API millions of times, it feels wrong.

Surely the law should protect that investment?

But here’s the hard principle – the law does not protect effort alone. The US Supreme Court in Feist Publications, Inc. v. Rural Telephone Service Co. rejected the “sweat of the brow” doctrine (the “sweat of the brow” doctrine is the idea that someone deserves copyright protection simply because they worked hard to create something). Hard work does not automatically create exclusive rights.

The business leaders reading this might say from their economic intuition that this feels unfair. Nevertheless, copyright doctrine says imitation of unprotected elements is permitted.

Artificial Intelligence data source copyright question concept. — Image: iStock/Getty Images Plus

That tension is not accidental. It’s structural.

Think of reverse engineering.

Reverse engineering refers to the process of analysing a finished product in order to understand how it works, particularly when its internal design or source code is not publicly available. In the context of software, this often involves examining a program’s behaviour, structure or outputs to identify its functional principles.

Importantly, the objective is not to copy protected expression, but to uncover unprotected functional elements such as systems, methods or operational logic.

In cases such as Sega Enterprises Ltd. v. Accolade, Inc. and Sony Computer Entertainment, Inc. v. Connectix Corp., courts permitted reverse engineering because it targeted functionality rather than expressive content. The legal tolerance stemmed from the core copyright principle that protection extends to expression, not to ideas, systems or methods of operation.

Model distillation fits structurally within this framework. It doesn’t access source code, extract model weights or replicate internal architecture. Instead, it observes outputs generated through lawful access and statistically infers patterns. Like reverse engineering, it studies behaviour rather than internal design.

The model is treated as a black box – inputs are supplied, outputs are analysed and functional inference follows.

Contract law: The real battleground

At this point, you might think, fine, copyright fails, but what about terms of service?

Most frontier AI developers expressly prohibit automated extraction, large-scale querying and the use of model outputs to train competing systems. On paper, these restrictions appear comprehensive. They seek to contractually prevent precisely the kind of behavioural harvesting that distillation entails.

Image: iStock/Getty Images Plus

Yet contract law is not an absolute barrier. Its effectiveness depends on foundational principles – clear notice, valid assent, identifiable parties and enforceable remedies. A contractual prohibition binds only those who are legally party to it.

If access is mediated through layered accounts, intermediaries or cross-border entities, establishing privity and attribution becomes complex. Moreover, breach of contractual terms does not automatically convert into criminal liability or broader statutory wrongdoing. The law distinguishes between private breach and unauthorised access in a technical or criminal sense.

There’s also a practical limitation. Contractual enforcement presumes violations can be detected, traced and quantified.

Large-scale distillation conducted through distributed querying, proxy infrastructure or fragmented accounts transforms enforcement into an evidentiary challenge. The issue becomes not merely whether a term was breached, but whether one can prove who acted, under which account, with what intent, and with what measurable harm.

In this sense, contract law functions as a contingent shield rather than an impermeable one.

Could trade secret law help?

Trade secret law offers another potential line of protection, but it operates within carefully defined boundaries. To qualify as a trade secret, information must derive economic value from not being generally known, and the holder must take reasonable measures to maintain its secrecy.

In the context of frontier AI systems, certain elements clearly meet this threshold – model weights, training datasets, optimisation techniques and internal architectural designs are closely guarded and technologically shielded.

The difficulty arises, however, at the interface between secrecy and disclosure. When a model is deployed through an application programming interface (API), its outputs are intentionally made available to users. The internal mechanics remain hidden, but the behavioural performance is observable.

Trade secret doctrine traditionally tolerates competitive learning from what is lawfully acquired and externally accessible.

If a firm exposes the functional behaviour of its system to the market, even in a controlled way, the law must determine whether observing and analysing that behaviour constitutes misappropriation or legitimate competition.

This distinction turns on the method of acquisition. Trade secret law prohibits improper means, such as theft, espionage or breach of confidence, but does not generally forbid independent discovery or analysis of publicly-available outputs.

Large-scale querying occupies an ambiguous position within this framework. It does not necessarily involve intrusion into confidential systems or circumvention of technical safeguards. Yet its scale and strategic intent may feel qualitatively different from ordinary user interaction.

The core question, then, is conceptual rather than purely technical: When does systematic observation of publicly-exposed behaviour cross the line into improper appropriation? Is high-volume probing merely an aggressive form of competitive analysis, or does it amount to a de facto extraction of protected knowledge?

Trade secret doctrine provides tools for addressing deception and wrongful acquisition, but it doesn’t offer a straightforward answer when the alleged misappropriation arises from analysing what was voluntarily disclosed. The boundary between lawful inference and unlawful extraction remains unsettled.

Maybe patents are the real weapon?

Patent law appears, at least conceptually, to offer a firmer foundation. Unlike copyright, which protects expressive works, patent law protects functional inventions. It protects new and non-obvious processes, systems, machines or methods. Its focus is not on copying language or text, but on the unauthorised use of a claimed technological solution.

In that sense, it’s better-aligned with the technological realities of AI systems, where value lies in architecture, optimisation methods and training techniques rather than in expressive content.

If an AI developer successfully patents a particular model architecture, training pipeline, optimisation method or distillation technique, then infringement does not depend on literal copying. It’s sufficient that another party practises the patented invention without authorisation.

Even if a competitor arrives at a similar result through behavioural replication, the legal question becomes whether the patented claims are being implemented in substance. Patent protection, therefore, extends beyond surface similarity and reaches into functional equivalence.

Yet patent law carries its own structural trade-off. Patents are granted only in exchange for public disclosure. The applicant must describe the invention in sufficient detail to enable others skilled in the field to reproduce it once the patent expires. This disclosure requirement is not incidental; it’s the core of the patent bargain. The state grants a time-limited monopoly in return for transparency.

This creates a strategic dilemma for frontier AI firms. Secrecy preserves competitive advantage indefinitely, provided confidentiality can be maintained. Patenting, by contrast, offers strong but time-limited exclusivity at the cost of revealing technical details.

One cannot simultaneously maximise both secrecy and patent protection over the same invention. The choice becomes strategic – retain knowledge as a trade secret and risk reverse engineering, or disclose it through patenting and secure enforceable exclusivity for a defined period.

In the context of rapidly-evolving AI technologies, that decision carries profound implications for innovation strategy and competitive positioning.

Beyond doctrine: When law lags behind intelligence

Now, let’s zoom out. The uncomfortable truth is this: The law is not yet conceptually prepared to decide who owns intelligence expressed as behaviour rather than code.

Courts were designed to adjudicate disputes over books, machines and tangible inventions. They’re now being asked to determine the boundaries of machine reasoning itself.

Companies such as OpenAI and Anthropic frame distillation not only as intellectual property harm but as national security risk. If export controls restrict access to advanced chips and models, but distillation allows capability transfer through outputs, then the debate shifts.

The issue becomes, is this copyright infringement or strategic circumvention? That’s no longer a question of private law. It becomes a question of public law.

Until doctrine evolves (if it at all evolves), frontier AI firms cannot rely solely on existing legal frameworks to safeguard their models.

Companies developing systems like Claude must increasingly look to technical countermeasures – rate-limiting, watermarking, usage anomaly detection, architectural compartmentalisation and hardwaresoftware integration strategies.

In the current environment, engineering protections may prove more immediate and reliable than doctrinal expansion.

Hence, the fundamental question remains unresolved: Should intelligence that can be observed be freely learnable?

The law does not yet provide a stable answer. And until it does, technological solutions may be the only practical line of defence.

LENS

AI distillation and the law: Why learning from Claude or GPT may not be copyright infringement

Is this piracy, or something else?

What about copyright?

Contract law: The real battleground

Could trade secret law help?

Maybe patents are the real weapon?

Beyond doctrine: When law lags behind intelligence

Read More

Santos greenwashing case: Federal Court decision falls short on corporate climate accountability

Study shows younger siblings spend more time on screens than big sisters and brothers

Gambling on disaster: We’re paying the price for climate inaction

Responsible AI is now a governance risk, not an ethics debate

Republish

Republishing Guidelines

Title

Content