OpenAI Offers $1 Million to $5 Million Annually to License Copyrighted News Articles for AI Training

https://icaro.icaromediagroup.com/system/images/photos/15973793/original/open-uri20240104-18-9xf3np?1704401395
ICARO Media Group
News
04/01/2024 20h48

In a groundbreaking move, OpenAI, a leading artificial intelligence (AI) company, has reportedly offered between $1 million and $5 million per year to license copyrighted news articles for training its AI models. This revelation sheds light on the emerging trend of AI companies seeking to pay for licensed material to enhance their machine learning capabilities. The information comes from a report by The Information, which also highlights Apple's efforts to partner with media companies, offering a minimum of $50 million over a multiyear period for data.

The figures proposed by OpenAI seem to be in line with some previous licensing deals that were not specifically related to AI. For instance, when Meta launched the Facebook News tab, it purportedly offered up to $3 million annually to license news stories, headlines, and previews. However, it remains uncertain whether these numbers will match the larger sums seen in other agreements. Google, in 2020, announced a staggering $1 billion investment to collaborate with news organizations. Additionally, due to a new law, Google recently agreed to pay Canadian publishers a total of $100 million annually for linking to their articles.

Currently, language models heavily rely on internet-based information for training purposes. While AI developers do not always disclose their training data sources, details about the datasets or web crawlers used are often available. Prices for training datasets vary depending on the provider, dataset size, and content. Some data providers, such as LAION, offer open-source datasets for free, while others employ web crawlers to gather training data from the internet. Nevertheless, significant challenges have arisen. Certain companies, including The New York Times and Vox Media (The Verge's parent company), have blocked OpenAI's GPT crawler from accessing their data. Moreover, various organizations argue that training AI models on their data violates copyright laws. The New York Times and Microsoft have even faced copyright infringement lawsuits related to their AI models generating output resembling copyrighted works.

To overcome these challenges, AI companies like OpenAI are increasingly seeking partnerships with news organizations. OpenAI has already secured deals with publishers such as Axel Springer, the parent company of Politico and Business Insider, and The Associated Press. These collaborations involve licensing news stories to train models like GPT-4 and developing news gathering technologies.

It is not just OpenAI and Apple that aim to partner with news organizations. Google reportedly showcased an AI tool called Genesis to executives from The New York Times, The Wall Street Journal, and The Washington Post. This tool generates news stories based on factual information. Generative AI tools have also been experimented with in newsrooms, albeit with mixed results.

As AI companies continue to recognize the value of licensed news content in refining their models, these new partnerships are expected to reshape the landscape of AI training and news gathering. The financial commitments by OpenAI and other companies underscore the growing importance of collaboration between AI developers and publishers in the quest for more advanced and accurate AI systems.

The views expressed in this article do not reflect the opinion of ICARO, or any of its affiliates.

Related