Confusion about training data model Sora that generates videos

Update 15/03/2024 – A month after the unveiling of Sora, OpenAI CTO Mira Murati has given an interview about the new model. The interview reveals a little more about the training data but introduces new confusion.

When the Wall Street Journal asked what data was used to train the model, Murati replied, “We used publicly available and licensed data.” She then confirmed that it was Shutterstock content, which OpenAI has a partnership with. However, the WSJ asked further whether content from YouTube, Facebook, and Instagram was also used. At that point, the confusion arises.

“I’m actually not sure about that,” Murati responds to the question about YouTube videos. About the use of Facebook and Instagram, she says that if the videos are publicly available, they may have been used. However, she is “not sure, not confident” about it. She then wants to stop the discussion. “I’m just not going to go into the details of the data that was used — but it was publicly available or licensed data,” the CTO concludes

Original – The creator of ChatGPT has developed a model that can create one-minute-long videos based on text.

Based on a prompt or a still image, Sora can create a video of up to a minute, with a video quality of 1080p. The user’s prompt is accurately followed. The generated video can include multiple characters and background details. The model can also expand existing video clips by adding missing details.

“The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style,” OpenAI explains. OpenAI ‘s website also features videos generated by Sora.

Optimizing Sora

OpenAI indicates that Sora is not perfect. For example, the model may have difficulty accurately simulating the physics of a complex scene. It also may not correctly understand some cases of cause and effect. For example, a person takes a bite out of a cookie, but the cookie may not have a bite mark.

OpenAI continues to develop the model, which may eventually eliminate the above limitations. Sora also relies on OpenAI’s research from DALL-E, the company’s model that can generate images based on prompts.

For now, Sora has limited availability. Red teams can work with it to identify potential problems. In addition, a limited number of visual professionals, designers and filmmakers will be given access so they can provide feedback on how to make the model more suitable for creatives.

Tip: Gemini 1.5 is much more than a new foundation model

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Tech calendar

Confusion about training data model Sora that generates videos

Optimizing Sora

Stay tuned, subscribe!

HPE closes acquisition of Juniper Networks

AI only works if the infrastructure is right

Is English the next programming language? JetBrains’ CEO says no

SentinelOne further delivers on Purple AI promise with new Athena release

CyberArk and SentinelOne join forces for better identity security

SentinelOne promotes Purple AI from security assistant to autonomous SOC analyst

SentinelOne XDR platform and Security Datalake get Gen AI boost

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices