Apple, Nvidia, and Anthropic Accused of Unauthorized Use of YouTube Videos for AI Training

https://icaro.icaromediagroup.com/system/images/photos/16294552/original/open-uri20240717-18-lic8d4?1721212317
ICARO Media Group
News
17/07/2024 10h28

In a recent development, Apple Inc. has come under fire for allegedly using YouTube videos from Alphabet Inc.'s subsidiary, YouTube, to train its AI models without the creators' consent. Tech YouTuber Marques Brownlee, also known as MKBHD, took to social media to express his concerns about Apple's use of YouTube content for AI training.

According to Brownlee, Apple sourced data from various companies, one of which scraped data and transcripts from YouTube videos, including his own. While Apple technically avoids "fault" since they are not directly involved in the scraping, Brownlee believes this issue is likely to persist and become an ongoing problem.

Brownlee further highlighted the concerning fact that he pays a service for more accurate transcriptions of his videos, which are uploaded to YouTube's back-end. Therefore, companies that scrape transcripts are not only stealing his paid work but also benefiting from it in multiple ways.

A report from 9to5Mac, shared by Brownlee, revealed that several tech giants, including Apple, trained their AI models using subtitle files downloaded by a third party from over 170,000 videos. The dataset included transcripts from renowned creators such as Brownlee, MrBeast, PewDiePie, Stephen Colbert, John Oliver, and Jimmy Kimmel.

Proof News investigation discovered that EleutherAI's dataset, known as the Pile, was also exploited by major companies like NVIDIA Corp. and Salesforce Inc. for their AI training. Surprisingly, these companies pursued this practice despite YouTube regulations explicitly prohibiting the unauthorized harvesting of materials from the platform.

Apple, Nvidia, Google, and Anthropic, the companies involved, have not provided immediate responses to Benzinga's request for comment regarding these allegations.

The issue of unauthorized content scraping for AI training has become a growing concern within the tech industry. Recently, OpenAI and Anthropic were accused of ignoring web scraping rules, leading to controversy. These companies allegedly bypassed the robots.txt protocol, designed to prevent automated scraping of websites.

Prompted by such practices, Reddit Inc. made changes to its platform to block automated content scraping. This policy change resulted in a notable 9% surge in Reddit's stock value, emphasizing the market's sensitivity to data privacy concerns.

Another notable instance of data scraping challenges emerged when Meta Platforms Inc. faced legal actions against a Chinese company, highlighting the widespread nature of this problem across various social media platforms.

Elon Musk has specifically cited AI scraping as a reason for implementing tweet paywalls on X, Inc. (formerly Twitter Inc.). Users now require an account to read tweets, and those wishing to view more than 600 posts per day must pay for Twitter Blue access.

As the debate over unauthorized content scraping for AI training continues, it raises fundamental questions about data privacy and the responsibilities of tech companies in protecting creators' rights on platforms like YouTube. Efforts to address these concerns have already led to policy changes and legal actions, underscoring the urgency of finding a solution to this evolving problem.

The views expressed in this article do not reflect the opinion of ICARO, or any of its affiliates.

Related