But it's not a game. It's a memory of a game video, predicting the next frame ba...

PoignardAzur · on Aug 28, 2024

> But it's not a game. It's a memory of a game video, predicting the next frame based on the few previous frames, like "I can imagine what happened next".

It's not super clear from the landing page, but I think it's an engine? Like, its input is both previous images and input for the next frame.

So as a player, if you press "shoot", the diffusion engine need to output an image where the monster in front of you takes damage/dies.

bergen · on Aug 28, 2024

How is what you think they say not clear?

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality.

Sharlin · on Aug 28, 2024

No, it’s predicting the next frame conditioned on past frames AND player actions! This is clear from the article. Mere video generation would be nothing new.

taneq · on Aug 28, 2024

It's more like the Tetris Effect, where the model has seen so much Doom that it confabulates gameplay.

TeMPOraL · on Aug 28, 2024

It's a memory of a video looped to controls, so frame 1 is "I wonder how would it look if the player pressed D instead of W", then the frame 2 is based on frame 1, etc. and couple frames in, it's already not remembering, but imagining the gameplay on the fly. It's not prerecorded, it responds to inputs during generation. That's what makes it a game engine.

mensetmanusman · on Aug 28, 2024

They could down convert the entire model to only utilize the subset of matrix components from stable diffusion. This approach may be able to improve internet bandwidth efficiency assuming consumers in the future have powerful enough computers.

WithinReason · on Aug 28, 2024

If it's trained on absolute player coordinates then it would likely just morph into the known map at those coordinates.

nine_k · on Aug 28, 2024

But it's trained on the actual screen pixel data, AFAICT. It's literally a visual imagination model, not gameplay / geometry imagination model. They had to make special provisions to the pixel data on the HUD which by its nature different than the pictures of a 3D world.