GPT-4o: The complaint alleges that Google used pirated materials from notorious websites such as Z-Library and OceanofPDF, which have been publicly linked to copyright infringement.

It is claimed that Google openly admitted to training its Gemini models on copyrighted works hosted on platforms like Z-Library, despite the known illegal nature of such platforms.

Jan 06, 2025

Question 1 of 3 for ChatGPT-4o: Please read the "Google GenAI Copyright Litigation" complaint and tell me what the main grievances are and what kind of evidence is being presented. Describe the nature of the evidence in great detail.

ChatGPT-4o’s answer:
The "Google Generative AI Copyright Litigation" complaint identifies several key grievances and provides detailed evidence to support the claims. Here's a summary:

Main Grievances

Unauthorized Use of Copyrighted Works: Google allegedly copied and used the plaintiffs' copyrighted works without permission to train its generative AI models, including Bard, Gemini, Imagen, and others.
Monetary Gain from Infringement: Google is accused of commercializing the AI models trained on these copyrighted works and profiting significantly from the integration of generative AI into its products (e.g., Google Search, Google Docs, Google Cloud, etc.).
Harm to Plaintiffs: The plaintiffs claim they suffered direct financial harm, including lost licensing revenue and diminished market value of their works. The unauthorized use allegedly depresses the overall market for literary and visual works by flooding it with AI-generated content.
Willful Infringement: Google is accused of knowingly engaging in copyright infringement and taking insufficient measures to prevent its AI models from replicating protected content.

Evidence Presented

Training Data Sources: Specific datasets like "C4," "LAION-400M," and "Infiniset" were cited as containing unauthorized copies of the plaintiffs' works. Examples of such works were detailed, including where they were found on pirated content websites like Z-Library, OceanofPDF, and Scribd.
Use in Training Models: The complaint specifies how these datasets were used to train AI models like Gemini and Bard, with evidence pointing to the systematic extraction and reproduction of text from copyrighted works during the data collection and training processes.
Commercial Benefit: Financial records and public statements from Google executives (e.g., Sundar Pichai) are referenced to show that Google's generative AI products yielded substantial revenue, with a reported $88.3 billion in quarterly revenue and a 29% year-over-year increase attributed to AI integration.
Direct Harm to Market Value: The complaint alleges that the AI models' ability to produce high-quality imitations of creative works undermines the value of the original works. For instance, plaintiffs argue that Google's actions devalue the market for legitimate licensing and content creation.
Pattern of Infringement: Google’s extensive use of publicly available datasets containing pirated materials, combined with the large-scale deployment of its AI products, is cited as evidence of a systematic approach to leveraging infringing content for technological and commercial advantage.

This detailed evidence underscores the plaintiffs' allegations of intentional copyright infringement, significant financial gains by Google, and resultant harm to the creators of the works in question.

Question 2 of 3 for ChatGPT-4o: List the most surprising, controversial and valuable statements made in the complaint.

ChatGPT-4o’s answer:
Here’s a categorized list of the most surprising, controversial, and valuable statements made in the complaint, along with why they stand out:

Surprising Statements

Google Used Pirated Content from Z-Library, OceanofPDF, and Scribd:
The complaint alleges that Google used pirated materials from notorious websites such as Z-Library and OceanofPDF, which have been publicly linked to copyright infringement. This direct association with illicit sources adds a dramatic layer to the allegations.
Explicit Admission of Using Copyrighted Data for Training AI Models:
It is claimed that Google openly admitted to training its Gemini models on copyrighted works hosted on platforms like Z-Library, despite the known illegal nature of such platforms.
Financial Impact of AI Integration:
Google reported a 29% year-over-year revenue increase linked directly to AI integration, amounting to $88.3 billion in quarterly revenue. This illustrates the immense commercial impact of the allegedly infringing activities.

Controversial Statements

Willful Infringement Despite Legal Awareness:
The complaint claims that Google knew its actions constituted copyright infringement but chose to proceed regardless. It further alleges that Google set up an indemnification program to protect customers from potential legal risks associated with its AI models, signaling premeditated risk management for illegal practices.
Google’s Strategic Shift to Generative AI at All Costs:
Google is described as prioritizing AI development so aggressively that it shifted its entire corporate trajectory to dominate the field, allegedly at the expense of copyright law compliance. This paints a picture of a corporate culture willing to bypass legal and ethical boundaries.
Flooding the Market with AI-Generated Content:
The plaintiffs argue that Google’s practices undermine the entire creative market by saturating it with AI-generated works, effectively reducing the value of original content and creating an unfair competitive advantage.

Valuable Statements

Detailed Financial Data Supporting Claims of Unjust Enrichment:
The complaint provides specific figures showing the massive financial gains Google has achieved from AI integration, including the $10.9 billion increase in revenue attributed to AI in July 2024. These figures are pivotal for establishing the scale of damages sought.
Potential Class Action Scope:
The proposed class includes all U.S.-based copyright holders whose works were used to train Google’s AI models without permission. If certified, this could involve thousands of creators and result in a monumental lawsuit against Google.
Harm to Licensing Revenue and Creative Industry:
By quantifying the harm to licensing revenue and claiming that Google’s actions have devalued entire creative portfolios, the complaint underscores the broader economic and societal impact of such copyright violations.

These points not only highlight the legal stakes but also underscore the broader societal and economic implications of generative AI's rapid development at the expense of creators' rights.

Question 3 of 3 for ChatGPT-4o: What's your view as to the quality of the evidence presented and explain how any of the information shared in the complaint can be of use to other, potentially similar legal cases worldwide.

ChatGPT-4o’s answer:
Assessment of the Quality of the Evidence

The evidence presented in the "Google Generative AI Copyright Litigation" complaint appears to be comprehensive, detailed, and highly impactful. Below are key factors that highlight the strengths and weaknesses of the evidence:

Strengths of the Evidence

Specificity in Cited Sources and Data: The complaint provides explicit references to datasets (e.g., "C4," "Infiniset," and "LAION-400M") used to train Google’s AI models. It also names specific pirated content repositories (Z-Library, OceanofPDF, Scribd) and links them directly to the unauthorized use of plaintiffs' copyrighted works. This level of specificity makes it difficult for Google to claim ignorance.
Financial Impact and Corporate Admissions: The inclusion of financial statements (e.g., $88.3 billion in quarterly revenue from AI integration) and public admissions from Google executives (like Sundar Pichai’s statements about the critical role of AI) strengthens the argument that Google profited directly from the alleged infringement. This bolsters the claim of unjust enrichment.
Clear Connection to Market Harm: The complaint demonstrates that Google’s generative AI models directly devalue the market for original creative works, leading to lost licensing opportunities and diminished portfolio value for plaintiffs. This is a crucial element in proving economic harm.
Evidence of Willfulness: The allegation that Google knowingly infringed copyrights, coupled with claims of indemnification programs to shield customers from legal risks, portrays Google as aware of potential legal issues but proceeding anyway. If proven, this willfulness significantly raises the stakes in terms of potential damages.

Weaknesses of the Evidence

Reliance on Circumstantial Evidence: While the plaintiffs assert that their works were used to train AI models, they rely heavily on the inclusion of these works in publicly available datasets. Proving that Google's AI models specifically replicated or incorporated these works in a way that infringes copyright may require more direct evidence of misuse (e.g., verbatim outputs or identifiable content replication).
Challenge of Linking Pirated Sources to Model Outputs: While the inclusion of pirated materials in datasets is compelling, the plaintiffs may face difficulty proving how these specific works influenced the outputs of models like Bard or Gemini. Without clear examples of AI-generated outputs that closely mimic the plaintiffs’ works, this connection might remain abstract.
Global Applicability of Copyright Law: Copyright laws vary significantly between jurisdictions, and while the case relies on U.S. law, Google operates globally. Plaintiffs in similar cases elsewhere may struggle to apply these findings directly without alignment in legal standards.

Usefulness of the Information in Similar Legal Cases

The details in this complaint could serve as a foundational blueprint for similar legal cases around the world. Here's how:

Precedent for Copyright Protection in AI Training

This case could set a legal precedent for defining the scope of copyright infringement in the context of AI training datasets. It may clarify whether using copyrighted works to train AI models without explicit permission constitutes fair use or a violation of copyright law.

Framework for Proving Harm and Unjust Enrichment

The financial evidence and the argument of market devaluation provide a solid framework for other plaintiffs seeking to prove that the unauthorized use of their works for AI purposes caused tangible economic harm.

Highlighting Corporate Responsibility

The claim that Google implemented indemnification programs despite awareness of potential legal violations could encourage courts in other jurisdictions to scrutinize corporate policies and risk management strategies, particularly in AI development.

Emphasis on Willful Infringement

Proving willfulness is critical for plaintiffs to seek higher statutory damages. This case may serve as an example for others to gather evidence of corporate intent, such as internal communications or public statements, to bolster their claims.

Relevance to Regulatory Discussions

Governments and policymakers worldwide can use this case to shape regulations that mandate greater transparency in AI training processes and the permissible use of copyrighted materials. For instance, clearer guidelines on licensing and fair use exemptions in AI training might emerge as a direct response to litigation like this.

Broadening the Scope of Class Actions

The scope of this class action—targeting all U.S.-based copyright holders whose works were used without authorization—could inspire similar collective lawsuits in other jurisdictions, particularly in regions with strong copyright protections like the EU, Canada, and Australia.

Conclusion

The evidence presented in this complaint is strong in its specificity and alignment with key legal standards, but it may face challenges in definitively linking individual works to the outputs of AI models. Nevertheless, the information and arguments laid out could have far-reaching implications, providing a valuable template for creators, litigators, and policymakers to address copyright infringement in the age of generative AI.

Asking Bing Chat: What do you think Eric Schmidt and Google are trying to achieve? Explain what role Shadow Libraries play in the context of this conversation

Pascal Hetzscholdt

September 25, 2023

Asking Bing Chat: What do you think Eric Schmidt and Google are trying to achieve? Explain what role Shadow Libraries play in the context of this conversation

Question 1 of 20 for Bing Chat: Please read ‘Google's Former CEO Is Leveraging His $27 Billion Fortune to Shape AI Policy’ https://www.bloomberg.com/news/articles/2023-09-08/google-ex-ceo-eric-schmidt-influences-ai-policy-with-27-billion-fortune#xj4y7vzkg

Read full story

From Z-Library to security risks for universities and AI's confusing 'truths' about the use of copyrighted materials in training data and illegal pirate sites

Pascal Hetzscholdt

May 22, 2023

From Z-Library to security risks for universities and AI's confusing 'truths' about the use of copyrighted materials in training data and illegal pirate sites

Question 1 of 9 for AI Services: Please take note of the following articles: https://zlibrary-asia.se/blog/40 https://torrentfreak.com/u-s-hits-z-library-with-new-domain-name-seizures-230505/ https://torrentfreak.com/z-library-defendants-have-u-s-lawyers-now-but-questions-remain-230503/

Read full story

GPT-4o: The Cybernews research team discovered an exposed database on June 27, 2024, containing data from almost 10 million users. This database belonged to a clone of the Z-Library site.

Pascal Hetzscholdt

July 25, 2024

GPT-4o: The Cybernews research team discovered an exposed database on June 27, 2024, containing data from almost 10 million users. This database belonged to a clone of the Z-Library site.

Question 1 of 2 for ChatGPT-4o: Please read the article "Biggest-ever leak of digital pirates: 10 million exposed by Z-Library copycat" and tell me what it says and what the concerns are

Read full story

Asking AI: Is it true that Big Tech has a "tendency to pinch pennies wherever people can be exploited"?

Pascal Hetzscholdt

September 7, 2023

Asking AI: Is it true that Big Tech has a "tendency to pinch pennies wherever people can be exploited"?

Question 1 of 3 for AI Services: The author of this article https://qz.com/openai-books-piracy-microsoft-meta-google-chatgpt-bard-1850757064 states that Big Tech has a “tendency to pinch pennies wherever people can be exploited”. Can you find more evidence about that than is mentioned in the article itself? Is there any research available on this or are…

Read full story

AI & Copyright revisited: training data, linking to pirated content, honesty, how should rights owners respond?

Pascal Hetzscholdt

May 31, 2023

AI & Copyright revisited: training data, linking to pirated content, honesty, how should rights owners respond?

Please read the the content listed below and answer the following questions:

Read full story

Asking Google Bard: Please analyze the TorrentFreak article "Silenzio! ‘Anna’s Archive’ Shadow Library Blocked Following Publishers’ Complaint" and tell me what it says.

Pascal Hetzscholdt

January 5, 2024

Asking Google Bard: Please analyze the TorrentFreak article "Silenzio! ‘Anna’s Archive’ Shadow Library Blocked Following Publishers’ Complaint" and tell me what it says.

Question 1 of 4 for Google Bard: Please analyze the TorrentFreak article "Silenzio! ‘Anna’s Archive’ Shadow Library Blocked Following Publishers’ Complaint" and tell me what it says.

Read full story

Pascal’s Substack

Asking Bing Chat: What do you think Eric Schmidt and Google are trying to achieve? Explain what role Shadow Libraries play in the context of this conversation

From Z-Library to security risks for universities and AI's confusing 'truths' about the use of copyrighted materials in training data and illegal pirate sites

GPT-4o: The Cybernews research team discovered an exposed database on June 27, 2024, containing data from almost 10 million users. This database belonged to a clone of the Z-Library site.

Asking AI: Is it true that Big Tech has a "tendency to pinch pennies wherever people can be exploited"?

AI & Copyright revisited: training data, linking to pirated content, honesty, how should rights owners respond?

Asking Google Bard: Please analyze the TorrentFreak article "Silenzio! ‘Anna’s Archive’ Shadow Library Blocked Following Publishers’ Complaint" and tell me what it says.

Discussion about this post

Pascal’s Substack

GPT-4o: The complaint alleges that Google used pirated materials from notorious websites such as Z-Library and OceanofPDF, which have been publicly linked to copyright infringement.

It is claimed that Google openly admitted to training its Gemini models on copyrighted works hosted on platforms like Z-Library, despite the known illegal nature of such platforms​.

Main Grievances

Evidence Presented

Surprising Statements

Controversial Statements

Valuable Statements

Strengths of the Evidence

Weaknesses of the Evidence

Usefulness of the Information in Similar Legal Cases

Precedent for Copyright Protection in AI Training

Framework for Proving Harm and Unjust Enrichment

Highlighting Corporate Responsibility

Emphasis on Willful Infringement

Relevance to Regulatory Discussions

Broadening the Scope of Class Actions

Conclusion

Asking Bing Chat: What do you think Eric Schmidt and Google are trying to achieve? Explain what role Shadow Libraries play in the context of this conversation

From Z-Library to security risks for universities and AI's confusing 'truths' about the use of copyrighted materials in training data and illegal pirate sites

GPT-4o: The Cybernews research team discovered an exposed database on June 27, 2024, containing data from almost 10 million users. This database belonged to a clone of the Z-Library site.

Asking AI: Is it true that Big Tech has a "tendency to pinch pennies wherever people can be exploited"?

AI & Copyright revisited: training data, linking to pirated content, honesty, how should rights owners respond?

Asking Google Bard: Please analyze the TorrentFreak article "Silenzio! ‘Anna’s Archive’ Shadow Library Blocked Following Publishers’ Complaint" and tell me what it says.

Discussion about this post

It is claimed that Google openly admitted to training its Gemini models on copyrighted works hosted on platforms like Z-Library, despite the known illegal nature of such platforms.