Five Frontier AI Models Disagreed on 67% of Fact-Check Claims in Latest Study

According to researcher Kosta Jordanov at Lenz Research, five frontier AI models disagreed on 67% of 1,000 real-world fact-check claims tested this month. The models—GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro with Search, and Sonar Pro—were asked to classify claims as true, mostly true, misleading, or false. In 34% of cases, disagreement was severe, with one model calling a claim true while another labeled it false.

The study measured agreement using Krippendorff's alpha, which scored 0.639 on a scale where 1.0 indicates perfect agreement; researchers generally consider scores below 0.8 weak. Unanimous agreement occurred on only 328 out of 1,000 claims, and notably, zero claims received unanimous "mostly true" verdicts. The researchers used claims submitted by real users to Lenz's fact-checking platform rather than standard benchmarks, reducing the likelihood that models pattern-matched against training data.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
GateUser-84f1f85dvip
· 14m ago
1000x Vibes 🤑
Reply0
GateUser-84f1f85dvip
· 14m ago
HODL Tight 💪 💪
View OriginalReply0
GateUser-84f1f85dvip
· 14m ago
Bull Run 🐂
Reply0
GateUser-84f1f85dvip
· 14m ago
Ape In 🚀
Reply0