OpenAI trained GPT-4 on millions of hours of YouTube audio
OpenAI trained GPT-4 on one million hours of audio from YouTube videos. The AI giant did not ask Google's permission to do so. However, the latter did not object because the company itself uses YouTube to train its own LLMs.
In 2021, OpenAI lacked reliable English-language data available online ... Read more
Sites can now block OpenAI data scraping, but should they?
OpenAI has revealed how others can identify its own web crawler. From now on, sites can block the GPTBot user agent if they want to. By doing so, they can potentially ensure that they are not used to train a future LLM of OpenAI, but is that advisable?
The documentation states that OpenAI can us... Read more
700 million LinkedIn user records are being offered for sale on a hackers’ forum
A database containing 700 million LinkedIn users’ records has been found put up for sale online on a hacking forum. The database has been on offer for sale since June 22 and is under the possession of a user named TomLiner, found on Raid Forums.
There is a sample of 1 million records that can ... Read more