Google DeepMind AI Co-Mathematician Hits 47.9% on FrontierMath Tier 4, Beats GPT-5.5 Pro, Solves 3 Previously Unsolvable Problems

Google DeepMind released AI co-mathematician, a multi-agent math research assistant, achieving 47.9% accuracy on FrontierMath Tier 4 benchmark, surpassing GPT-5.5 Pro’s previous record of 39.6% on May 9. The system solved 23 out of 48 problems, including 3 that all previous models failed to solve. Built on Gemini 3.1 Pro, the architecture uses a hierarchical design with a project coordinator agent distributing tasks to sub-agents handling literature retrieval, coding, and reasoning, with multiple reviewer agents validating proofs before submission.

Epoch AI conducted blind testing, preventing the DeepMind team from seeing problems, with each question allowed 48 hours of computation. In real-world application, mathematician Marc Lackenby used the system to resolve an open conjecture from the Kourovka Notebook, demonstrating its practical research value. The system is currently available to a limited number of mathematicians in beta testing.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

US Judge Rules DOGE Grant Cuts Unlawful After Using ChatGPT and DEI Keywords, Blocks Enforcement on Thursday

According to ABC News, on Thursday a US federal judge ruled that grant cuts carried out by Elon Musk-backed DOGE were unlawful. US District Judge Colleen McMahon in New York said staff used ChatGPT and keyword searches including 'DEI,' 'Equity,' 'Inclusion,' and 'LGBTQ' to help terminate funding

GateNews1h ago

ECB Official Says AI Risks Prompt Financial Infrastructure Review on Saturday

José Luis Escrivá, European Central Bank Governing Council member and Bank of Spain governor, said on Saturday that central banks must review the resilience of financial infrastructure and cybersecurity given the rise of artificial intelligence. "Recent developments in artificial intelligence

GateNews1h ago

Cloudflare Stock Plunges 23.62% on May 8 After Q1 Earnings, 1,100-Person Layoff Announcement

Cloudflare's stock fell 23.62% on May 8 to $196.13 per share following the company's first-quarter earnings release and announcement of approximately 1,100 layoffs. While Q1 revenue of $640 million exceeded expectations with 34% year-over-year growth, second-quarter revenue guidance of $664–$665 mil

GateNews3h ago

Helsing Aims to Raise Funding at $18 Billion Valuation

According to Financial Times, Helsing, a German AI-powered drone startup, is planning to raise new funding at approximately $18 billion valuation.

GateNews3h ago

OpenAI's Reward System Inadvertently Scores Thinking Chains on 6 Models Including GPT-5.4

According to OpenAI's alignment team, the company recently discovered a critical training error affecting 6 large language models including GPT-5.4 Thinking: the reward mechanism inadvertently scored model thinking chains—the internal reasoning process before generating answers. GPT-5.5 was not affe

GateNews5h ago

Alibaba Did Not Conduct Negotiations With DeepSeek, Market Sources Clarify on May 9

According to market sources reported by Caixin Daily on May 9, Alibaba did not conduct negotiations with DeepSeek regarding funding. This clarification follows earlier media reports suggesting talks between the two companies had broken down. DeepSeek launched a significant fundraising round in

GateNews5h ago
Comment
0/400
No comments