AI from DeepMind can generate realistic videos from YouTube videos

DeepMind, a sister company of Google, has written a paper in which it describes an artificial intelligence (AI) that can generate realistic videos using YouTube videos.

The AI – Dual Video Discriminator GAN (DVD-GAN) – can create coherent 256 by 256 pixel videos with remarkable reliability and a length of up to 48 frames, writes Venturebeat.

Generating natural video is an obvious next challenge, but one that is plagued by growing data complexity and computational demands, according to the paper’s authors.

According to them, much of the work involved in generating videos was therefore mainly about relatively simple datasets, or about tasks for which there is a high degree of time conditioning available. We focus on the tasks of video synthesis and video prediction, and aim to bring the strong results of generative image modelling to the video domain.

GANÂ’s

Specifically, the researchers used GANs, which are two-part AI systems consisting of generators that produce samples, and discriminators that try to differentiate between the samples generated and those from the real world. The researchers mainly used BigGANs, which are distinguished by the large quantities and millions of parameters they use.

DVD-GAN uses two discriminators. First of all, there is a discriminator that criticizes the content and structure of a single frame by randomly grabbing frames and processing them individually. The second discriminator offers a learning signal to generate movements. Finally, there is a Transformer, which allows learned information to be distributed throughout the AI model.

Kinetics-600

DVD-GAN was then trained on Kinetics-600, a dataset of natural videos. The dataset is composed of 500,000 10-second, high-resolution YouTube clips. This dataset was initially set up to recognise human actions. According to the researchers, the dataset is diverse and informal. This means that overfitting needs to be removed. Overfitting refers to models that correspond too closely to a specific dataset, making it difficult for them to predict future observations.

DVD-GAN was finally trained between 12 and 96 hours on Google’s Tensor Processing Units. After that, the AI could make videos with object composition, movement and even complicated textures as the side of an ice rink.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Whitepapers

AI from DeepMind can generate realistic videos from YouTube videos

GANÂ’s

Kinetics-600

Stay tuned, subscribe!

Replatforming virtualized workloads: Do your VMs need a new home?

Ingram Micro slowly gets back on its feet after ransomware attack

EUVD security database is Europe’s next step towards autonomy

Dutch government starts consultation for NIS2 bill

NIS2 leads to better basic hygiene

NIS2: law lacks future-proof ideas, challenging ambitions and recovery

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon