Legal development

AI training vs. copyright: US Court rules AI company's use of millions of literary works for AI training is fair use

blue triangle shape

    What you need to know

    • A US Court has determined that copying whole books for the purposes of training an AI large language model was a transformative use which fell within the fair use defence.
    • Equally the Court found that purchasing books and scanning a copy was fair use.
    • By contrast, storing pirated copies of books as part of a library was not fair use.
    • Australia's copyright laws have a defence of fair dealing which is much narrower than fair use. The differences mean that the result in this case is likely to be different in Australia. This will have implications on companies considering whether to train large language models in Australia and may lead to calls for legislative reform.

    In a landmark US decision that could reshape the future of AI and copyright law (Bartz v Anthropic), a US District Court has drawn a sharp line between innovation and copyright infringement. As AI companies race to build increasingly powerful language models, this decision could redefine the boundaries for how they source and use data. The decision could also revitalise discussion on potential reforms to Australian copyright law to accommodate AI companies.

    Background

    On 23 June 2025, the United States District Court for the Northern District of California issued a summary judgment addressing a claim for copyright infringement brought by several authors against Anthropic, an AI company.

    Anthropic downloaded over seven million e-books without making any payment and with full knowledge that the books were pirate copies. Anthropic also purchased hardcopy books and scanned them into digital form to store them as digitised, searchable files. Anthropic conducted these activities to amass a central library of "all the books in the world". From this library, various digitised books were used to train its large language models (LLM) to develop Anthropic's AI service, Claude, which itself generates over one billion US dollars in annual revenue.

    The Court was asked to determine whether any of Anthropic's various uses of the authors' works qualified as "fair use" under US copyright law, including the use of both pirated and lawfully purchased books to: (i) train its LLMs for Claude; and (ii) build a digital library. The Court was not asked to address whether any of the output from Claude might infringe copyright.

    Overview of findings

    A. Legitimate format shifting = fair use

    • The Court held that Anthropic’s practice of purchasing print books, scanning them (while destroying the originals), and storing the digital versions for internal use was fair use. This was on the basis that the digital copies merely replaced the purchased print copies for reasons of storage and searchability, without creating new works or distributing copies outside the company.

    B. Training AI models = fair use

    • The Court found that using literary works (including both the pirated and the lawfully purchased books) to train LLM to generate new, non-infringing outputs is a "highly transformative" use and constitutes fair use. The use was held to be "transformative" as it allows the AI model to "turn a hard corner and create something different… like any reader aspiring to be a writer"; the use is not intended to allow the AI model to "race ahead and replicate or supplant" the inputted works.
    • This reasoning was supported by the fact that the authors did not claim that any LLM output infringed their works. Rather there was specific software deployed with Claude to avoid any output infringing copyright. The Court may have come to a different conclusion if Claude was trained with literary works to create infringing works, given the purpose of the input would no longer be "transformative".

    C. Download and retention of pirated copies of books ≠ fair use

    • The Court held that Anthropic’s downloading and retention of pirated e-books to build a permanent, general-purpose digital library was not fair use. The creation of a central library from pirated sources, even if some of those works were later used for transformative purposes like AI training, was not justified.
    • The Court did not make a finding on the question of liability and damages for the unauthorised download and retention of those literary works. This case will proceed to trial on those issues.

    Implications for Australian law

    This decision highlights the tension between technological innovation and copyright protection. While there is ongoing discussion about reforming Australian copyright law to accommodate new technologies like AI, currently there is no general "fair use" defence for copyright infringement in Australia. The Australian defence to copyright infringement is "fair dealing". However, this only applies when the work is used for specific and narrow purposes such as research, study, criticism, review, parody or satire.

    Given that Australia's fair dealing defence is much narrower than the fair use defence in the United States, the use of electronic books (pirated or legitimately purchased) to train an AI service for profit in Australia could lead to a different result. This highlights that those building LLM will need to carefully consider the copyright laws in the jurisdiction in which they are training their models. We will also need to wait and see if the Australian legislature takes steps to encourage building of LLM in Australia by undertaking copyright law reform.

    Other authors: Elise Jensen, Lawyer.

    The information provided is not intended to be a comprehensive review of all developments in the law and practice, or to cover all aspects of those referred to.
    Readers should take legal advice before applying it to specific issues or transactions.