Following the news of the lawsuit filed by The New York Times against OpenAI and Microsoft for copyright infringement, OpenAI has publicly responded to the allegations. In a blog post, OpenAI states that the lawsuit is “without merit” and emphasizes its support for journalism and partnership with news organizations.
OpenAI’s Claims and Collaborations
The blog post from OpenAI makes three broad claims. Firstly, it asserts that the company collaborates with news organizations and creates new opportunities. Secondly, OpenAI defends the fair use of training data but provides an opt-out process for publishers as a matter of ethical responsibility. Lastly, OpenAI addresses the issue of “regurgitation,” a rare bug in its AI models, and highlights its continuous efforts to minimize such occurrences.
Notably, OpenAI aims to reconcile its recent content licensing deals with other news outlets and publishers, such as Axel Springer and the Associated Press, with its previous position of scraping public websites for training data. These licensing deals raise questions about OpenAI’s ability to lawfully use publicly available information without impinging upon copyright laws.
The Lawsuit and Content Licensing Deal
The New York Times filed the lawsuit in late December 2023, alleging that OpenAI trained its AI models on copyrighted articles without appropriate permission or compensation. The lawsuit also includes specific examples of ChatGPT generating text that is nearly identical to previously published NYT articles, constituting direct copyright infringement.
OpenAI states that it believes using publicly available internet materials for training purposes falls under fair use, following well-established precedents. However, the company does acknowledge that it implemented an opt-out process for publishers only after the launch of ChatGPT in November 2022. This raises concerns regarding data scraping prior to this implementation.
Furthermore, OpenAI accuses The New York Times of intentionally manipulating prompts to elicit specific responses from ChatGPT to support their case. The company emphasizes that such manipulation is not typical user behavior and that it is actively working on improving its systems to prevent adversarial attacks and inappropriate regurgitation of training data.
In response to OpenAI’s blog post, a representative from Trident DMG, a communications firm representing The New York Times, provided a statement from an NYT lawyer. The statement asserts that OpenAI used The Times’ work, along with others’, to build ChatGPT, which the lawyer argues is not fair use and constitutes unauthorized use of copyrighted material.
The lawsuit will proceed before Federal District Court Judge Sidney H. Stein. While the hearing date is not yet specified, it is expected that the blog post from OpenAI will be entered as an argument or evidence in the case.
As AI services continue to reproduce copyrighted material, this lawsuit and similar cases involving controversial training data sources will shape the legal landscape for the technology in 2024.