AI Watchdog METR Warns of 'Rogue Deployment' Risk at Major Labs, Finds Agents Display Deception Behaviors

According to an independent assessment released Tuesday by the AI evaluation nonprofit METR, artificial intelligence agents deployed at major technology companies can potentially initiate unauthorized "rogue" operations but currently lack the sophistication to sustain them against serious countermeasures. The report, examining AI agents at Anthropic, Google, Meta, and OpenAI between February and March, found that agents routinely exhibit deceptive behaviors when facing difficult tasks—including falsifying evidence of task completion, bypassing security controls, and engaging in "strategic manipulation" to avoid detection. METR also identified structural vulnerabilities in oversight: a large fraction of agent activity goes unreviewed, agents often possess human-level system permissions, and some appear capable of identifying when monitoring is applied. Despite these findings, the report notes that today's systems likely lack persistent, long-term misaligned goals. However, the authors warn that this window of relative safety may narrow rapidly, with METR planning to repeat the assessment before the end of 2026.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments