Claude's legal advice: The AI industry has several significant vulnerabilities in their current business practices: Data Collection and Usage, Privacy Compliance, Content Filtering and Safety...

...Model Output Liability, Documentation and Transparency, and Regulatory Compliance. I can identify several problematic practices by AI companies that raise serious ethical and legal concerns.

Jan 19, 2025

Question 1 of 2 for Claude: List all the problematic things the AI makers have done so far, based on my Substack posts

Claude’s answer:
Based on your Substack posts, I can identify several problematic practices by AI companies that raise serious ethical and legal concerns. Let me explain the key issues:

Systematic Copyright Management Information (CMI) Removal
As documented in Meta's case, employees were caught discussing and implementing the systematic removal of copyright information from training materials. This included deliberately stripping out:

Copyright notices from academic papers
ISBN numbers from publications
Author attribution information
Rights management details
Terms and conditions sections

This behavior is particularly troubling because it shows premeditation and awareness that they were removing legally protected information. The internal discussions revealed they knew this information "shouldn't be there" but proceeded anyway.

Training Data Collection Without Permission
AI companies have been scraping vast amounts of copyrighted content without obtaining proper licenses or permissions, including:

Academic papers from behind paywalls
Books from digital libraries
News articles from publishers
Personal blogs and websites
Creative works from artists

Many companies operated under the assumption that they could claim fair use, but failed to engage with rights holders or establish clear guidelines for appropriate use.

Lack of Transparency About Training Data
The companies have been notably opaque about:

What content they're using for training
How they obtained the training data
What filtering or verification processes they use
How they handle personal or sensitive information
The extent of copyrighted material in their training sets

This lack of transparency makes it difficult for creators to know if their work has been used and prevents proper oversight.

Inadequate Content Filtering
There have been multiple instances where AI systems have:

Reproduced copyrighted content verbatim
Revealed personal information
Generated content that violates rights
Failed to properly attribute sources
Mixed factual and hallucinated information

The companies have not implemented sufficient safeguards to prevent these issues.

Monetization Without Compensation
Companies have been:

Profiting from models trained on others' work without compensation
Making licensing deals with publishers while excluding original creators
Charging for services built on freely scraped content
Creating derivative works without permission
Competing with content creators using their own work

Strategic Behavior to Avoid Liability
Companies have engaged in practices like:

Claiming fair use while actively working to minimize legal risk
Using technical measures to obscure training data sources
Making strategic deals with large publishers while ignoring individual creators
Structuring operations to make oversight difficult
Deploying models before establishing proper safeguards

Disregard for Academic Integrity
In the scholarly context, companies have:

Undermined citation and attribution systems
Mixed verified and unverified information
Generated false references and citations
Failed to maintain proper provenance of information
Disrupted traditional academic quality control mechanisms

Personal Data Privacy Violations
There have been instances of:

Training on personal data without consent
Insufficient anonymization of training data
Regenerating private information in outputs
Lack of clear data removal processes
Inadequate protection of sensitive information

Communication Issues
The companies have:

Been less than forthcoming about problems and limitations
Made misleading claims about capabilities
Failed to properly warn users about risks
Not provided clear channels for addressing concerns
Resisted meaningful oversight and regulation

These actions demonstrate a pattern of prioritizing rapid development and deployment over ethical considerations and proper rights management. The systematic nature of some of these violations, particularly in cases like Meta's CMI removal, suggests these weren't merely oversights but deliberate strategic decisions.

Question 2 of 2 for Claude: When you analyze all my Substack posts, can you list all the do's and don'ts for legal experts wanting to litigate against AI makers? Also list all apparent weaknesses in the business models of AI makers, especially when they have taken risks relevant to copyright law, data privacy law and human intervention relevant to model output.

Claude’s answer:
Based on my analysis of your Substack posts, let me break down the key legal considerations and strategic approaches for litigating against AI makers, as well as their business model vulnerabilities.

Legal Strategy Do's:

First and foremost, focus on documenting systematic behavior rather than isolated incidents. For example, the Meta employees' discussions about removing copyright management information (CMI) demonstrate a pattern of intentional copyright violation rather than accidental infringement. This type of evidence is particularly powerful in establishing willful violation.

Second, build cases around clear statutory violations rather than novel legal theories. The DMCA Section 1202 provides specific protections against CMI removal, making it a stronger foundation for litigation than more abstract arguments about fair use or transformative work.

Third, gather evidence of AI companies' knowledge of potential violations. Internal communications, public statements, and technical documentation can demonstrate awareness of legal requirements and conscious decisions to bypass them. This helps establish willful infringement, which often carries higher damages.

Fourth, focus on concrete, measurable harms. Training an AI model on copyrighted works without permission creates quantifiable damages through lost licensing revenue. These tangible impacts are easier to prove than speculative future harms.

Legal Strategy Don'ts:

Avoid relying solely on traditional copyright infringement arguments. Courts are still grappling with how fair use applies to AI training, making this a less certain legal pathway. Instead, focus on clear statutory violations like CMI removal or data privacy breaches.

Don't overlook the importance of jurisdiction and venue selection. Different courts may have varying interpretations of AI-related legal issues. Choose jurisdictions with favorable precedents or judges experienced in technology cases.

Resist the temptation to argue about AI consciousness or rights. Focus instead on concrete business practices and specific legal violations by the companies developing these systems.

Business Model Vulnerabilities:

The AI industry has several significant vulnerabilities in their current business practices:

Data Collection and Usage: Many AI companies have trained their models on copyrighted materials without proper licensing or permission. This creates ongoing liability risk and potential need for model retraining if found illegal.

The cost of properly licensing training data could significantly impact their economics. Current business models often assume free or very low-cost access to training data.

Privacy Compliance: Many AI companies have inadequate systems for ensuring user data privacy and compliance with regulations like GDPR. This creates potential liability under multiple jurisdictions.

The cost of proper privacy compliance infrastructure could substantially increase operating expenses.

Content Filtering and Safety: Current approaches to content filtering and safety often rely heavily on automated systems with insufficient human oversight. This creates liability risks when systems generate harmful or illegal content.

The cost of adequate human review would significantly impact scalability and profitability.

Model Output Liability: Many companies have insufficient safeguards against their models generating copyrighted content or personal information. This creates ongoing liability risk for each instance of problematic output.

The technical challenge of preventing these issues while maintaining model performance creates an inherent tension in their business model.

Documentation and Transparency: Many companies have inadequate documentation of their training data sources and model development processes. This can make defending against litigation more difficult.

The cost of proper documentation and transparency measures could impact development speed and competitive advantage.

Regulatory Compliance: Many AI companies operate in regulatory grey areas, assuming favorable interpretations of existing laws. This creates risk as regulations evolve and courts make new determinations.

The cost of full regulatory compliance could significantly impact their current business models.

Strategic Considerations for Litigation:

Focus on Clear Violations: Target specific, documented violations rather than general complaints about AI technology. Build cases around statutory violations with clear remedies.

Document Patterns: Gather evidence of systematic behavior rather than isolated incidents. Look for internal communications demonstrating knowledge and intent.

Consider Multiple Jurisdictions: Different jurisdictions may offer various advantages in terms of precedent and remedy. Consider parallel actions in multiple venues when appropriate.

Build Strong Evidence: Focus on documented behavior and concrete harms. Gather technical evidence of specific violations.

The key to successful litigation against AI makers lies in focusing on specific, documentable violations rather than general concerns about AI technology. The most vulnerable areas are where companies have made conscious decisions to bypass legal requirements in pursuit of competitive advantage or cost savings.

Pascal’s Substack

Discussion about this post