Apple and Salesforce Address Allegations of Data Usage from YouTube Videos for AI Training
ICARO Media Group
In response to a recent report by Proof News, tech giants Apple and Salesforce have addressed allegations regarding the use of data from thousands of YouTube videos to train their artificial intelligence (AI) models. The investigation claimed that subtitles from 173,000 YouTube videos, including content from educational channels such as Khan Academy, MIT, and Harvard, were utilized by these companies.
While Anthropic has yet to respond to the allegations, both Apple and Salesforce have issued statements clarifying their positions. Apple confirmed that its open-source language model, OpenELM, did indeed utilize the dataset mentioned in the report. However, the company emphasized that the model was developed for research purposes only and does not serve as the foundation for any of Apple's machine learning-powered hardware or AI services, including Apple Intelligence.
Apple Intelligence, unveiled at the WWDC 2024 event, is a suite of AI features offered by Apple. These features include text summarization for quicker interactions, entertainment-focused features like Genmoji that generate new iOS emojis, and Image Playground, which allows users to create AI-generated images.
In regard to AI usage for consumers, Apple highlighted that it provides websites with the option to opt out of having their content used for AI training. The company assured users that they employ high-quality data, including licensed content from publishers and stock image companies, alongside publicly available web data, to build and fine-tune their generative models.
Salesforce, on the other hand, explained that the dataset referenced in the research paper, named The Pile dataset, was used for academic and research purposes in 2021. According to a representative from Salesforce, the dataset was publicly available and released under a permissive license.
There has been no statement from Nvidia, known for its integration of AI in gaming hardware and services, regarding these allegations.
Should there be any updates or additional information from Anthropic, this article will be updated accordingly.
In conclusion, both Apple and Salesforce have acknowledged the usage of the YouTube dataset in AI training but want to clarify that it does not form the basis of their AI services. Apple seeks to benefit the broader research community through its OpenELM project, while Salesforce maintains that their usage was limited to academic and research purposes.
Kimberly Gedeon, a tech explorer at Mashable, has been keeping a close eye on the developments in the tech industry since 2023. Her expertise lies in exploring the latest gadgets and technology trends, ranging from iPhones to virtual reality headsets. She has a particular interest in avant-garde and unusual tech innovations, such as 3D laptops and transformative gaming rigs. Her career in journalism began a decade ago, covering tech and business at MadameNoire before joining Laptop Mag as a tech editor in 2020.
The dataset in question, known as The Pile, contains content from renowned institutions such as Harvard, NPR, and 'The Late Show With Stephen Colbert.'