
DeepMind unveiled Gemini Robotics 1.5 and Gemini Robotics‑ER 1.5, embodied AI models that turn visual input into motor commands and reason before acting.
Viral demos show a mobile robot mastering household tasks, board games and even stacking cups, sparking debates about robot agency and safety.
Researchers tout state‑of‑the‑art performance on 15 embodied reasoning benchmarks, while social media wonders when these robots might leave the lab.
Introduction
Gemini Robotics 1.5 is Google DeepMind’s latest advance in embodied artificial intelligence, and the name has been everywhere online since yesterday morning. A short demo video posted on Reddit’s r/Artificial and cross‑posted on X and Hacker News shows a humanoid robot calmly navigating a kitchen, pouring juice, cleaning up spills and then sitting down to play a game of Jenga – all while narrating its reasoning in a calm, conversational tone. That clip amassed tens of thousands of upvotes and shares within hours, and the hashtag #GeminiRobotics1 5 trended in multiple countries. But beyond the splashy demos, what exactly did DeepMind release, and why do technologists and ethicists care? This article breaks down the models’ capabilities, the engineering advances behind them and the safety questions they raise.
What Is Gemini Robotics 1.5?
At its core, Gemini Robotics 1.5 is a pair of models – Gemini Robotics 1.5 and Gemini Robotics‑ER 1.5 – designed to give robots the same kind of multimodal “mind” that Gemini Ultra provides to large language models. According to DeepMind’s announcement, these models “can convert multimodal observations and instructions into motor commands”. In plain English, that means a robot can look at the world through its cameras and sensors, listen to verbal instructions and then figure out the sequence of joint movements necessary to complete a task. Unlike earlier systems that react reflexively, the new models “think before taking action,” using an internal planning step to predict the consequences of different actions.
The Robotics 1.5 model handles low‑level control, translating visual and proprioceptive data into motor outputs. Its companion, Robotics‑ER 1.5, is an embodied reasoning module that performs high‑level planning and maintains temporal context. DeepMind highlights that these models can be swapped across different robot bodies, letting the system generalise tasks such as grasping, sorting or stacking across mobile manipulators, stationary arms and even simulated avatars. Together, they achieved state‑of‑the‑art performance on 15 embodied reasoning benchmarks, outperforming previous methods in tasks ranging from object rearrangement to long‑horizon planning.
How Does the Agentic Framework Work?
What sets this release apart is the agentic framework. The Robotics‑ER 1.5 model does not just send commands to motors; it performs a “thought” phase where it anticipates how its plan will unfold. DeepMind calls this an internal simulator or “mind’s eye.” By rolling out multiple potential action sequences and scoring them, the robot can select the strategy most likely to succeed. This is similar to how large language models perform chain‑of‑thought reasoning in text – the robot is essentially writing down its plan before executing it. The approach reduces trial‑and‑error, so tasks are completed faster and with fewer mistakes.
In practice, this manifests in surprising ways. In a viral clip from TikTok (replicated in the screenshot below), a robot asked to “make me breakfast” wanders through a domestic kitchen, opening cupboards and refrigerator doors, retrieving a loaf of bread and butter, and correctly using a knife to spread butter without crushing the bread. When faced with an unexpected obstacle (a utensil drawer left open), the robot pauses, looks back, re‑plans its path and gently closes the drawer before proceeding. These small moments of foresight have captivated viewers because they suggest an almost human‑like awareness.
Safety, Alignment and Societal Reactions
Despite the excitement, DeepMind stresses that Gemini Robotics 1.5 is being released only to “trusted testers” and emphasises safety. The company notes that it has incorporated “alignment techniques such as adversarial robustness, interpretability and user‑defined constraints”. In other words, the models are trained to avoid harmful actions, provide transparency into their reasoning and respect physical safety rules. Early testers have been given checklists and override mechanisms to stop the robot if it behaves unexpectedly.
That hasn’t stopped speculation. Threads on Reddit and LinkedIn ask whether robots with such planning abilities could be weaponised or used for surveillance. Others worry about labour displacement; after all, if a robot can cook, clean and assemble IKEA furniture, will it replace domestic workers? These fears aren’t unfounded, but experts argue that embodied AI is still far from general autonomy. The robot in the demo is limited to a controlled environment, and tasks require extensive training data. Moreover, DeepMind researchers emphasise that their goal is to build robots that can augment human productivity rather than replace it.
Benchmarks and Early Results
DeepMind’s blog highlights that Gemini Robotics 1.5 achieved top scores on a suite of 15 embodied reasoning benchmarks. These benchmarks measure how well models can manipulate objects, perform navigation tasks and plan over long horizons. For example, the robot surpassed previous state‑of‑the‑art methods on the BEHAVIOR dataset, which involves tasks like sweeping floors, setting tables and stacking blocks. In addition, the model was tested on simulation environments such as VirtualHome and Habitat, where it demonstrated robust performance across different scenes and lighting conditions.
To visualise the improvements, the chart below compares Gemini Robotics 1.5’s success rates on key benchmarks to those of the previous Gemini Robotics baseline. Across all tasks, the new model shows a notable boost in success and planning efficiency.
Why This Matters
The release of Gemini Robotics 1.5 is more than a flashy demo; it represents a shift in how AI systems perceive and interact with the physical world. By combining multimodal perception with internal planning, the model inches closer to what robotics researchers call “embodied intelligence.” This could accelerate progress in assistive robotics, warehouse automation and even healthcare. For consumers, it signals that robots may soon be able to handle household chores, elderly care and other tasks that require dexterity and common sense.
From an industry perspective, DeepMind’s focus on safety and alignment acknowledges the ethical stakes. As robots become more capable, companies must ensure they do not cause harm or reinforce societal inequalities. Regulatory bodies are watching closely; the European Union’s proposed AI Act explicitly covers high‑risk AI systems like autonomous robots. DeepMind’s limited release suggests that researchers recognise both the promise and the perils of giving machines agency.
For AI enthusiasts following these developments, the best way to stay informed is to follow comprehensive coverage and analysis. Our AI research news hub has been tracking advances in embodied AI for years, providing context and expert commentary to help readers understand the implications of new models like Gemini Robotics 1.5.
What’s Next
DeepMind has not announced a public release date for Gemini Robotics 1.5, but the company plans to expand testing to more diverse robotic platforms, including quadrupeds and drones. Researchers are also exploring how to integrate language models with robotic controllers, enabling natural dialogue and instruction following. Meanwhile, the AI community will continue to debate the social and ethical implications. Cross-domain comparisons—such as between embodied systems like Gemini Robotics and proactive assistants like ChatGPT Pulse—highlight how AI is evolving toward agents that act with foresight in both digital and physical spaces. Expect more viral clips as trusted testers push the robots into creative scenarios – maybe a robot chef attempting a soufflé or a toddler‑proof robot nanny. The next year promises to be fascinating for embodied AI.