AIs Struggle with IQ Tests: Deepmind's Revealing Experiments
AIs Struggle to Ace IQ Tests: Deepmind's Revealing Experiments Show Limitations in Physical Understanding. Insights into how AI systems perform on visual reasoning tasks and their current gaps in grasping the physical world.
22 марта 2025 г.

Discover the surprising limitations of AI systems as they face a series of physics-based challenges, revealing the gap between visual realism and true physical understanding. This insightful exploration offers a thought-provoking look at the current state of artificial intelligence and its journey towards human-like intelligence.
The Simplest Question: Rotating Teapot
The Painting Challenge
Light vs. Heavy: A Classic Physics Experiment
The Match in Water Test
The Surprising Results: AI's Struggle with Physics
Visual IQ Test: Asking AI About Images
Why the Struggle? Reasons Behind AI's Limitations
Conclusion
The Simplest Question: Rotating Teapot
The Simplest Question: Rotating Teapot
The first experiment involved a simple task of a rotating teapot. The AI models were shown the start of the video and asked to predict what would happen in the next 5 seconds.
Pika 1.0 completely failed this test, predicting that the teapot would not rotate but instead grow a pedestal, which was a "complete disaster" according to the narrator. Lumiere also struggled, not being able to accurately predict the rotation of the teapot and the location of the handles.
On the other hand, OpenAI's Sora and Runway's Gen3 performed better, though not perfectly, in predicting the rotation of the teapot. The narrator described their performance as "not bad" compared to the other models.
This simple experiment highlighted the limitations of these AI models in understanding basic physical concepts, even for a task as straightforward as a rotating teapot.
The Painting Challenge
The Painting Challenge
The second experiment involved a painting task, where the AI models were shown the start of a painting and asked to predict what would happen next. This challenge was a bit more complex than the rotating teapot, as it involved both rotation and a clear progression of the painting.
When presented with this task, the results were quite mixed. OpenAI's Sora, which had performed reasonably well in the previous task, completely missed the mark, generating an output that had little to do with the actual progression of the painting. Pika 1.0 also failed to accurately predict the next steps, instead suggesting a zooming-in effect that was not reflective of the actual painting process.
Lumiere, the AI model from the previous experiment, continued to struggle, demonstrating a lack of understanding of the physical dynamics involved in the painting task.
However, the VideoPoet model showed a more reasonable, though not perfect, prediction of the painting's progression. While not entirely accurate, its response was more aligned with the expected outcome compared to the other models.
This experiment highlighted the limitations of the AI models in understanding and predicting complex physical processes, even when presented with visual cues. The inability to accurately anticipate the next steps in the painting task suggests that these systems still have a long way to go in developing a true understanding of the physical world.
Light vs. Heavy: A Classic Physics Experiment
Light vs. Heavy: A Classic Physics Experiment
In this experiment, the AI systems were presented with a classic physics scenario involving a heavy kettlebell object and a light scrap of paper. The expectation was that the heavy object would leave a larger imprint on the pillow compared to the light paper.
However, the results were disastrous. VideoPoet incorrectly predicted that the "evil pillow" would eat the paper, and then the paper would stab the pillow as revenge. Pika 1.0 simply stated that there was "nothing to see here" and just kept zooming in. OpenAI's Sora did not provide a coherent response.
This experiment highlighted the AI systems' lack of understanding of basic physical principles, such as the relationship between an object's mass and its impact on a surface. Despite their ability to generate photorealistic footage, the AI models struggled to comprehend and predict the outcome of this simple physics scenario, demonstrating a significant gap between visual realism and physical understanding.
The Match in Water Test
The Match in Water Test
Runway Gen3 claims that the match will float when put in water, which is clearly incorrect. Lumiere asserts that the fire will continue to exist underwater, which is also wrong. VideoPoet suggests that an explosion will occur, which is not the case either.
The best performer in this test is OpenAI's Sora, which somewhat understands the concept, but then incorrectly concludes that the match will be lit on fire by the water.
Overall, the AI systems demonstrate a poor understanding of basic physics principles, as they fail to correctly predict the outcome of this simple experiment where a lit match is placed in water. This highlights the limitations of these AI techniques in comprehending fundamental physical phenomena, despite their impressive capabilities in generating photorealistic imagery.
The Surprising Results: AI's Struggle with Physics
The Surprising Results: AI's Struggle with Physics
The experiments conducted by the scientists at Google DeepMind revealed that despite the impressive capabilities of AI systems in generating photorealistic footage, they often struggle to understand the underlying physics governing the world around us.
The results showed that these AI models performed poorly in tasks that tested their knowledge of solid dynamics, fluid dynamics, optics, thermodynamics, and magnetism. Even in relatively simple scenarios, such as a rotating teapot or a heavy object interacting with a pillow, the AI systems often failed to accurately predict the outcomes.
Interestingly, the AI models seemed to have a better grasp of fluid mechanics compared to solid dynamics, which is counterintuitive given the perceived complexity of fluid dynamics. This suggests that the AI's understanding of the physical world is not necessarily aligned with human intuition.
Furthermore, the study found that as the AI systems were trained on more data, their performance on these physics-based tests did not improve significantly. This indicates that the physical understanding of these models is fundamentally different from human intelligence and may require a different approach to be developed.
The results of these experiments highlight the limitations of current AI techniques in truly comprehending the world around us. While these systems can generate impressive visual outputs, they often lack the deeper understanding of the underlying physical principles that govern the behavior of objects and phenomena. This underscores the need for continued research and development in the field of AI to bridge the gap between visual realism and physical understanding.
Visual IQ Test: Asking AI About Images
Visual IQ Test: Asking AI About Images
Scientists have conducted experiments to test whether AI systems truly understand the visual information they are processing, or if they are simply generating photorealistic outputs without a deeper comprehension of the underlying physics and dynamics.
These experiments involved showing AI models the start of a video and asking them to predict what would happen in the next few seconds. The results were quite surprising - while the AI models could generate visually convincing footage, they often failed to accurately predict the physical outcomes, even for relatively simple scenarios.
Further tests were conducted, probing the AI's understanding of solid dynamics, fluid mechanics, optics, thermodynamics, and other physical phenomena. The results were consistently poor, with the best-performing model (a multiframe version of VideoPoet) scoring below 30% on these "visual IQ tests."
The findings suggest that while current AI techniques can produce highly realistic-looking visuals, they do not necessarily have a deep, human-like understanding of the world around us. The disconnect between visual realism and physical understanding highlights the limitations of these AI systems and the need for further advancements in developing AI that can truly comprehend the world in the way humans do.
Why the Struggle? Reasons Behind AI's Limitations
Why the Struggle? Reasons Behind AI's Limitations
The results of the experiments conducted by the scientists reveal that current AI techniques, despite their impressive capabilities in generating photorealistic footage, struggle to truly understand the physical world around them. This disconnect between visual realism and physical understanding highlights two key reasons behind the limitations of these AI systems:
-
Mismatch between Training and Physical Understanding: The tasks that these AI systems are typically trained for differ significantly from the kind of physical understanding required to accurately predict the outcomes of the experiments. Simply teaching them more about physics does not necessarily translate to better performance on these tests.
-
Lack of Fundamental Shift in Intelligence: Surprisingly, as these AI algorithms are trained on more data and become more sophisticated, they do not show a corresponding improvement in their ability to reason about physical phenomena. This suggests that the current AI systems, while capable of remarkable feats, are fundamentally different from human intelligence and still have a long way to go in terms of developing a deeper understanding of the world around them.
In essence, the struggle of these AI techniques to accurately predict the outcomes of the experiments highlights the fact that visual realism and physical understanding are not necessarily synonymous. The AI systems may be able to generate highly realistic-looking footage, but they lack the fundamental grasp of the underlying physical principles that govern the behavior of the objects and phenomena they are attempting to depict.
Conclusion
Conclusion
The results of the experiments conducted by the scientists at Google DeepMind reveal that current AI techniques, despite their impressive capabilities in generating photorealistic footage, still struggle to truly understand the physical world around them. The AIs tested, including Sora, Pika 1.0, Lumiere, VideoPoet, and Runway Gen3, consistently failed to accurately predict the outcomes of various physical scenarios, such as a rotating teapot, a painting in motion, and the interaction between light and heavy objects.
The findings suggest that visual realism and physical understanding do not necessarily go hand in hand. The best-performing AI system, the multiframe version of VideoPoet, still scored below 30% on the tests, indicating that these systems have a long way to go in developing a deeper comprehension of the fundamental laws of physics.
Interestingly, the AIs performed better in understanding fluid mechanics compared to solid dynamics, which is counterintuitive, as fluids are generally considered more complex to model. This highlights the unique nature of the intelligence exhibited by these AI systems, which differs significantly from human intelligence.
The study also reveals that simply teaching these algorithms more about physical concepts does not necessarily lead to improved performance on tests like these. The underlying issue seems to be that the tasks these systems are typically trained for do not align well with the requirements for understanding the physical world.
In conclusion, the findings presented in this section underscore the limitations of current AI techniques and the need for further advancements in developing AI systems that can truly comprehend the world around them, beyond just generating visually appealing outputs.
Часто задаваемые вопросы
Часто задаваемые вопросы