Recently the Deep Mind team of Google has introduced Genie, a new innovative and generative model of AI that is capable of generating 2D video games from a single input or prompt. A few weeks before OpenAI launched SORA, a text-video generator that surprised tech lovers. People were just thinking about what we can expect from our tech giants and Google has introduced Genie Ai to the world.
Google has explained that Google Gennie AI is a “world model” and trained on a massive 200,000 hours of video footage mainly of 2D platforms gamers. Genie is not like the traditional AI models that require significant input or labeled data. Genie actually errands by observing the actions within these videos and can work on single prompt input.
Tim Rocktäschel, the team lead of Open-Endedness Google DeepMind on X(Formerly Twitter). He said, “We introduce Genie, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.”
Also read – Lenovo Debut the World’s First Transparent Laptop
How Does Genie AI Actually Work?
With the launch of this new model, everyone is questioning how the Genie actually works. Let’s have a brief discussion on that and we will try to understand the workings of this video-generating model. As per the sources, Genie can understand the images, or text prompt and then can create a virtual word for the users so that they can play and interact with it.
Basically, there are three main components of Genie. Let us understand each of them in brief:
1. Video Tokenizer
The main task of the Vido tokenizer is that it efficiently process the massive data into a simpler unit called “Tokens”. These are working as a foundation model of genie. The big video is broken down into smaller pieces so that the genie can better understand the visual word.
2. Latent Action Model
After breaking down the videos into tokens the action model then started analyzing the frames in the video. “Genie’s learned latent action space is not just diverse and consistent, but also interpretable. After a few turns, humans generally figure out a mapping to semantically meaningful actions (like going left, right, jumping, etc.),” said Rocktaschel.
3. Dynamic Model
This model is responsible for making future predictions by understanding past actions. It learns what actions have been taken in the past for the similar situation. It’s mainly responsible for understanding how things will change over time. It will help Genie to understand the current situation of the game and what action should the player take.
Also read – ChatGPT vs Gemini AI
However, Google Genie AI has not been released yet and is still in the research and development process. It is not available for public use and the DeepMind team of Google is researching this project. Genie is capable of prompting with the images and it is going to be the first time in the history of tech. It will allow the users to interact with the virtual world they have created.
In its research phase, Google Genie AI is serving as a foundation World Model which is focusing on the 2D platform and Robotics.
As per the words of Rocktaschel- ”Genie’s model is general and not constrained to 2D. We also train a Genie on robotics data (RT-1) without actions and demonstrate that we can learn an action-controllable simulator there too. We think this is a promising step towards general world models for AGI.” For now, Google Genie AI is only capable of generating low frames video which can affect the visual quality. But when Genie is released for public use it will definitely bring a new era of creativity and technology.