The Wiley Network: Breaking Down Emerging Segments in the AI Content Licensing Landscape
"One of our key objectives is to disseminate published content as widely as possible. Protecting copyright is another fundamental objective."
November 18, 2024
Josh Jarrett, Senior Vice President and General Manager, AI Growth
Wiley has successfully navigated new technological eras for over 200 years, and much of that success has been grounded in our approach to partnering with the communities we serve. As leader of the team at Wiley that is working to optimize and safeguard our shared AI future through direct interaction with AI model and application developers, and in that spirit of partnership, I want to share my thoughts on the evolution of the AI landscape and the parameters Wiley uses when considering an AI licensing partnership.
Let me share three foundational statements before we dive in:
We've made the decision at Wiley to lean into AI. AI presents both opportunities and risks, and we believe that we will learn faster and have a greater impact on how the field evolves by leaning in and participating. We want Wiley and our partners to shape the AI future, not be shaped by it.
We are committed to applying our partnership philosophy in this exciting new space: we will share what we are learning, listen to our stakeholders, and evolve as we go. We will continue to be good partners to authors, societies, and associations. We want to tread this path together—soliciting input and co-creating approaches that work for the communities we serve.
We believe licensing is a critical lever for publishers and copyright owners to protect intellectual property and shape the AI ecosystem. Licensing is not a new activity—enabling broad dissemination of high-quality content is core to our mission. However, it’s particularly important here. Let’s acknowledge the obvious — many of these large language models are already trained on huge amounts of copyrighted content, probably trillions of words, all under the claim of fair use. There are court cases pending, but it could be years before we have any resolution there. In the meantime, licensing is an established mechanism to secure protections for copyright, improve the reach and impact of quality content, establish frameworks for attribution, and secure compensation. We, along with most other publishers and author advocacy groups, are in favor of licensing for these reasons. The devil, of course, is in the detail: how can publishers license in the most thoughtful way?
The AI Licensing Landscape: Context and Use Cases
The AI landscape is evolving rapidly, and it's essential to understand the different players and use cases within this broad field. We can identify three primary market segments emerging in the AI landscape:
Foundational Large Language Model (LLM) Training: This segment includes big tech companies and a few institutions that can afford the cost and complexity of building these models. These developers seek vast amounts of data to train their models to be highly knowledgeable, akin to a university student studying for a test by reading extensive material. The training process involves running trillions of words through neural networks to establish 'model weights,' which are used to generate future results. The content doesn't persist in the model; only the weights do. As a result, the original content is not displayed verbatim, making author attribution challenging. LLM developers prefer comprehensive books for training due to their length, subject matter expertise, and language complexity.
Fine-Tuning Customized Models: This segment involves optimizing an LLM to understand a specific use case, such as pharmaceutical questions or local language translation. Developers in this segment seek high-quality, specialized content in a particular domain, similar to sending a smart university student to graduate school. The process is akin to foundational model training, with the original model's weights adjusted to produce better results for the specific domain. While books remain valuable, there is growing interest in journal content for fine-tuning.
Retrieval Augmented Generation (RAG): This segment combines the broad power of an LLM with a tailored set of specific content that the model can reference to answer specific prompts, like a university student taking an open-book test. This approach, applied at what’s called the “inference” stage, is gaining attention for its applications in science, such as conducting meta-studies or identifying drug interaction scenarios. Companies in this segment prefer authoritative, frequently updated journal content over books. Attribution, citation, and backlinks are more relevant in this segment, and there is a need to enable these features.
As we navigate this dynamic landscape, it's crucial to match the right content to the right uses and establish objectives to guide us. Licensing is a critical lever for publishers and copyright owners to protect intellectual property and shape the AI ecosystem. By understanding these market segments, we can better position ourselves to take advantage of the opportunities AI presents while mitigating its risks.
Wiley’s AI Content Licensing Objectives
At Wiley, we have established clear objectives to guide our AI licensing efforts. Our primary goal is to act as good stewards for our authors and partners, ensuring that our actions align with our AI Principles. We believe that by adhering to these principles, we can foster ethical and safe use of AI technologies.
One of our key objectives is to disseminate published content as widely as possible. Licensing is not a new activity for Wiley: enabling the broad dissemination of high-quality content is core to our mission. We believe it is in the public interest that AI models are trained on high-quality content like ours.
Transparency is another crucial objective. Many early deals, including two signed by Wiley, are bound by confidentiality clauses. We are now pushing for the industry standard to become non-confidential deals – and believe regulatory policy updates will push the industry in that direction. Wiley supports this: we believe society and users should know what content a model was trained on. For example, if someone is making a healthcare choice, they should know whether the model was trained on a Reddit discussion board, a preprint server, or a validated, peer-reviewed journal. We are advocating for greater transparency in future deals and the field as a whole.
We also require attribution where more than minimal content is displayed. While attribution for foundational LLMs can be challenging, we are focused on the RAG use case, where authoritative content is directly referenced. In such cases, citation and attribution are critical, and we are working on requirements and standards to facilitate this.
Protecting copyright is another fundamental objective. We firmly believe that AI model developers should not use copyrighted material without permission. Licensing establishes a framework for respecting intellectual property rights and securing compensation for rightsholders.
Matching the right content with the right uses is also essential. We recognize that we cannot anticipate all the eventualities of AI in academic research and publishing. Therefore, we are pursuing key limitations, such as limiting the content available for foundational LLM training to backlist and archive content and restricting the grant of rights for certain applications.
Finally, we aim to secure compensation for our book authors and society partners, and to enhance and protect the value of our core publishing business. We believe that copyright holders should receive fair compensation when their intellectual property is used, consistent with contractual arrangements. By leveraging our collective scale, we can negotiate the best deals with big tech companies and ensure that our book authors and partners benefit from these opportunities.
These objectives guide our approach to AI licensing, ensuring that we act responsibly and in the best interests of our content creators and the broader community.
Predictions for the Evolution of the Market
Wiley’s leading position in AI licensing affords us unique insight into the future evolution of this market. We foresee a shift from large-scale deals with big tech companies for huge amounts of data from publishers’ archives to new areas. These include – but are likely not limited to – hyper-specialized, high-quality content in particular domains; real-time data such as news and academic journal content; and multi-modal content like image, video, and voice. Moving beyond model building – work so far can be likened to the iOS vs Android moment in the development of operating system technology – the real value will be in the applications built on top. Excitingly, we are seeing emerging opportunities to advance scientific discovery through AI, and we want Wiley and our publishing partners to be at the leading edge of that development. That's why Wiley is leading by actively participating in AI application development, particularly for science and discovery. This is why we launched Wiley AI Partnerships: A Co-Innovation Program, to develop cutting-edge AI application that address the needs and challenges of researchers and practitioners, advancing research and discovery. Through this groundbreaking program, we will transform the pace of research by delivering unparalleled efficiency and accuracy to the research process – empowering informed decisions and propelling breakthroughs.
Reimagining the Future Together
We are excited by the emerging opportunities to advance scientific discovery through AI, and we want Wiley and our publishing partners to be at the leading edge of that development. Our commitment to our partners as we explore these opportunities is to listen, to learn, and to share our progress as we move forward together.
Source: https://www.wiley.com/en-us/network/trending-stories/breaking-down-emerging-segments-in-the-ai-content-licensing-landscape