A Deep Neural Network that turns Any Image into a Playable Game! All on consumer GPUs, NOT DATACENTER

Community Article Published June 1, 2026

Written by @abhisheksensharma


Introducing the Supercut 🤗✨ from my 516M World Model.

At lucidml, I've been working on a simple idea:

Can a neural network learn to simulate interactive worlds in real time on consumer hardware?

The model takes in an image and realtime keyboard input from me, then generates future frames autoregressively.

Every clip in this video is from real play sessions where I was controlling the video.

All streamed from an RTX 5090 machine.

None of the initial images were from the training dataset. Every one of them was taken from Google Image Search.


Research Method

I started from the weights of the lucidml 420M image DiT model.

(lucidml.ai/image)

From there, I added temporal mixing modules and trained on video and gameplay data to model dynamics over time.

That makes the core denoiser completely trained from scratch, and not an AR-distillation of an existing video model.

All of this was developed under a compute budget that is a fraction of what frontier labs typically have access to.


The Funny Part

What's funny is that this is barely 1% of what's achievable.

As I write this, I'm finishing training a newer 800M model that significantly improves motion quality, diversity, and long-context behavior on top of the ideas explored here.

I've not even touched quantisation yet.


About the Video

This video is a supercut of some of my favorite rollouts so far.

PS: This was my first time ever editing a video in Final Cut Pro.

The red car rollout video is 2× sped up.

PS, I used handbrake to make the file size smaller, hope that hasn't affected the video quality much.

Follow Along

I'll write regularly about:

  • World models
  • Generative gaming
  • Training runs
  • Failures
  • Breakthroughs
  • Everything I'm learning while building lucidml

Community

Sign up or log in to comment