Tag: data scraping

Here you will find all the articles with the tag: data scraping.

OpenAI trained GPT-4 on millions of hours of YouTube audio

OpenAI trained GPT-4 on millions of hours of YouTube audio

OpenAI trained GPT-4 on one million hours of audio from YouTube videos. The AI giant did not ask Google's permission to do so. However, the latter did not object because the company itself uses YouTube to train its own LLMs. In 2021, OpenAI lacked reliable English-language data available online ... Read more

date16 days ago
Sites can now block OpenAI data scraping, but should they?

Sites can now block OpenAI data scraping, but should they?

OpenAI has revealed how others can identify its own web crawler. From now on, sites can block the GPTBot user agent if they want to. By doing so, they can potentially ensure that they are not used to train a future LLM of OpenAI, but is that advisable? The documentation states that OpenAI can us... Read more

date8 months ago