Introduction
Google has launched a new version of its Gemini family of models called Gemini 2.5 Deep Think. The model adopts an unprecedented parallel thinking approach to reasoning and uses reinforcement learning to explore many solution paths before selecting the best answer. It is being rolled out to subscribers of Google’s AI Ultra plan and includes a tier that earned a gold medal at the International Math Olympiad (IMO). The company claims state‑of‑the‑art results on code‑generation and reasoning benchmarks and promises to share the gold‑medal model with select researchers.
What Happened?
-
Release to subscribers: Google’s official blog announced that Gemini 2.5 Deep Think is now available through the Gemini app for AI Ultra subscribers. The subscription costs about $250 per month and offers access to a model that runs multiple agents in parallel to explore many solution paths.
-
Parallel thinking and reinforcement learning: Deep Think improves on earlier Gemini releases by extending the model’s thinking time and using reinforcement learning so the system can generate several ideas concurrently, evaluate them, and refine the best answer. According to a 9to5Google report, this approach allows for longer, more detailed responses and improved performance on long‑context tasks.
-
Best‑in‑class benchmarks: Google says the model achieved top scores on LiveCodeBench V6 (87.6 % accuracy) and outperformed competing systems like OpenAI’s o3 and xAI’s Grok 4 on the Humanity’s Last Exam (HLE) reasoning benchmark. A special variant of Deep Think achieved a gold medal at the 2025 IMO; Google is releasing a bronze‑level version to subscribers and plans to share the gold‑medal version with select mathematicians.
-
High context window: Subscribers receive a one‑million‑token context window, enabling the model to handle large documents and complex coding tasks.
Why It Matters
-
New paradigm for reasoning: Deep Think’s parallel thinking introduces a workflow where multiple AI agents tackle a problem simultaneously, then reconcile their outputs. This paradigm could enable AI systems to solve complex problems, such as advanced mathematics or multi‑step coding tasks, more reliably than sequential large‑language‑model inference.
-
Boost to generative‑coding and research: By scoring 87.6 % on LiveCodeBench V6, Deep Think shows major improvements in automated code generation. Its ability to handle long contexts also opens doors for research in scientific discovery and algorithm development, as noted by Google’s blog.
-
Commercial pressure and accessibility: Access to the model is currently gated behind a $250 per month subscription, which has sparked debate. Some developers on Hacker News felt the price and daily usage limits were steep and that the model’s improvements did not justify the cost news.ycombinator.com. Others said the model delivered impressive results but still wished for broader access. The cost and limited availability may push other providers to increase transparency around pricing and performance.
Web Reactions
-
Enthusiasm and skepticism: On social media, many users praised the model for beating human contestants in mathematics competitions, but others criticised the high cost and subscription modelnews.ycombinator.com. The official Google Gemini X account pinned a post celebrating the release and emphasised the extended thinking timex.com.
-
Industry commentary: TechCrunch’s coverage highlighted that Deep Think’s improved reasoning required running multiple agents, which is resource‑intensive and hence expensive. Commenters speculated that the high cost might limit adoption to enterprise and research labs.
Expert Breakdown
Deep Think signals a shift from single‑agent large language models toward multi‑agent reasoning. By running parallel chains of thought, the model can explore different solution paths. This is reminiscent of the way human teams collaborate on problems and may reduce susceptibility to hallucinations by cross‑checking multiple hypotheses.
Comparatively, OpenAI has been experimenting with tool‑use and plug‑ins, but its models still rely mostly on sequential reasoning. Deep Think’s performance edges on coding benchmarks suggest that exploring many solutions before committing may yield higher‑quality code. However, the reinforcement‑learning training required to make this process stable is likely expensive. Analysts note that this design could inform the next generation of AGI research, where multiple agents collaborate to solve tasks, moving closer to human‑level reasoning.
Final Thoughts
Google’s release of Gemini 2.5 Deep Think is a landmark moment for AI. Its parallel thinking technique and record‑breaking benchmark scores showcase how scaling reasoning and training can yield substantial gains. Yet the high price and restricted availability mean that only a select audience can experiment with the model for now. As competitors race to match this performance, the industry will have to balance computational costs with accessibility. Researchers should watch whether multi‑agent architectures become the new norm and whether subsequent versions open up to wider audiences.