Google has launched its impressive AI video generation model, Veo 2, now available on Gemini. Excitingly, you can create videos for free using Veo 2 in AI Studio. I took this opportunity to thoroughly test Veo 2 and assess its capabilities in generating realistic AI videos. In this review, I examined Veo 2’s accuracy in physics, motion consistency, spatial and temporal coherence, human realism, and its image-to-video features. While Veo 2 produces remarkable AI-generated videos, not every output is flawless.
Veo 2: Evaluating Physics Accuracy
For any video generation model, a solid grasp of physics is crucial for creating believable visuals. To begin my assessment, I prompted Veo 2 to create a video depicting a cat nudging a glass of water to observe how it manages collision, gravity, acceleration, and liquid dynamics.
You may notice that the glass leans slightly before being nudged, remaining in that position for too long, which defies the laws of physics. It almost appears suspended in mid-air without any influence from gravity or acceleration.
On the plus side, the remainder of the video is more convincing, as the liquid spills realistically once the cat interacts with the glass. Overall, I’d conclude that Google Veo 2’s grasp of physics is still developing. While it surpasses OpenAI’s Sora in its understanding of physics, there’s still significant room for improvement.
Veo 2: Evaluating Motion
Next, I assessed how well Veo 2 handles motion, another intricate aspect of video generation. Generating a smooth animation of a moving subject requires a comprehensive understanding of the physical world. In this evaluation, Veo 2 delivered commendable results.
I asked Veo 2 to generate a video showing a man strolling through a forest. In the linked video, the character remains consistent and convincingly portrayed throughout the frames, which is a significant win for this AI video generation model.
Additionally, the movement appears fluid, and the background remains coherent. Even the falling leaves and passing squirrels are rendered accurately.
However, I observed that Veo 2 struggles with scenes containing multiple characters or objects. Maintaining visual coherence appears challenging, likely due to limited capacity for managing complexity. Nevertheless, when it comes to motion and character consistency, Veo 2 performs exceptionally well.
Veo 2: Evaluating Spatial and Temporal Coherence
Spatial and temporal coherence are vital for ensuring that AI-generated videos are consistent and lifelike. Spatial coherence pertains to the consistency of individual frames, such as shadows and reflections, while temporal coherence involves maintaining uniform motion and identity across frames.
In a previous review of Sora, I noted that OpenAI’s model struggles significantly with temporal coherence, resulting in erratic videos. To examine Veo 2, I asked it to generate a video of a ball bouncing on a table and colliding with three dice.
Veo 2 produced a fairly believable video, showcasing the ball moving past a blue mug on the table while effectively handling shadowing and lighting. The ball’s movement was smooth; however, Veo 2 ended up regenerating an entirely new frame, resulting in a loss of coherence among the objects. The dice moved unnaturally, and a hand appeared unexpectedly, disrupting temporal coherence.
Despite this, Google Veo 2 shows significant improvements over earlier AI video generation models. I believe these issues can be resolved with future iterative updates.
Veo 2: Evaluating Cinematic Realism
For my next test, I prompted Veo 2 to generate a cinematically realistic video of a man standing in a rainy city to check its ability to render authentic human faces and express subtle emotions. I also wanted to evaluate rain simulation and its interaction with the surrounding environment.
Veo 2 exceeded expectations. The video appears believable and natural, showcasing a depth of field and focus on the human face. The rain simulation is impressive as well.
In another assessment, Veo 2 produced a futuristic video of a spacecraft entering Earth’s orbit. The motion was fluid, and the sci-fi elements were visually captivating. Veo 2 excels at creating cinematic clips, making it a useful tool for content creators looking to incorporate brief segments into their projects.
Veo 2: Evaluating Image-to-Video Capability
Since Veo 2 supports image-to-video generation, I uploaded a still image of a forest featuring rivers and mountains. I requested Veo 2 to create a video based on this image, complete with lush flora and fauna. I also asked it to add a herd of elephants and birds near the riverbank. As evidenced in the video below, Veo 2 accomplished this exceedingly well.
Admittedly, the elephants appeared rather abruptly, which is a bit jarring, but overall, the rest of the video is quite convincing. The shadows and motions are executed well, and Veo 2 retained the original image. When I uploaded the same image to OpenAI’s Sora, it failed miserably to produce a coherent video in my prior attempt.
Conclusion: Google Veo 2 Shows Great Promise but Needs Improvement in Physics
In summary, Google’s Veo 2 stands as the cutting-edge video generation model and outshines all rival AI models, including OpenAI’s Sora. Although it occasionally falters in adherence to the laws of physics, in general, it produces believable outputs. Veo 2 effectively generates authentic human expressions and emotions.
However, it’s worth noting that Veo 2 encounters difficulties when faced with numerous human subjects or objects in a single scene. Additionally, it struggles with complex prompts. Nonetheless, credit must be given to Google for developing a robust AI model that establishes a new standard in video generation.