3 min Analytics

Confusion about training data model Sora that generates videos

Confusion about training data model Sora that generates videos

Update 15/03/2024 – A month after the unveiling of Sora, OpenAI CTO Mira Murati has given an interview about the new model. The interview reveals a little more about the training data but introduces new confusion.

When the Wall Street Journal asked what data was used to train the model, Murati replied, “We used publicly available and licensed data.” She then confirmed that it was Shutterstock content, which OpenAI has a partnership with. However, the WSJ asked further whether content from YouTube, Facebook, and Instagram was also used. At that point, the confusion arises.

“I’m actually not sure about that,” Murati responds to the question about YouTube videos. About the use of Facebook and Instagram, she says that if the videos are publicly available, they may have been used. However, she is “not sure, not confident” about it. She then wants to stop the discussion. “I’m just not going to go into the details of the data that was used — but it was publicly available or licensed data,” the CTO concludes

Original – The creator of ChatGPT has developed a model that can create one-minute-long videos based on text.

Based on a prompt or a still image, Sora can create a video of up to a minute, with a video quality of 1080p. The user’s prompt is accurately followed. The generated video can include multiple characters and background details. The model can also expand existing video clips by adding missing details.

“The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style,” OpenAI explains. OpenAI ‘s website also features videos generated by Sora.

Optimizing Sora

OpenAI indicates that Sora is not perfect. For example, the model may have difficulty accurately simulating the physics of a complex scene. It also may not correctly understand some cases of cause and effect. For example, a person takes a bite out of a cookie, but the cookie may not have a bite mark.

OpenAI continues to develop the model, which may eventually eliminate the above limitations. Sora also relies on OpenAI’s research from DALL-E, the company’s model that can generate images based on prompts.

For now, Sora has limited availability. Red teams can work with it to identify potential problems. In addition, a limited number of visual professionals, designers and filmmakers will be given access so they can provide feedback on how to make the model more suitable for creatives.

Tip: Gemini 1.5 is much more than a new foundation model