Microsoft researchers disclosed a vulnerability in Anthropic's Claude Code GitHub Action that allowed attackers to expose credentials through prompt injection attacks, with Anthropic patching the flaw on May 5. Microsoft disclosed the issue through HackerOne on April 29 and published details in a blog post on Friday. The vulnerability stemmed from the AI coding agent's processing of malicious instructions hidden in GitHub issues, pull requests, or comments. Microsoft initiated the research after observing prompt injection attempts in public repositories using AI-assisted GitHub workflows, where attacker-controlled content could influence the AI agent's tool use. The disclosure highlights security risks created by AI coding agents running inside CI/CD workflows, which often have access to API keys, cloud credentials, and other sensitive information.
Microsoft wrote in its blog post that the research began after observing prompt injection attempts in public repositories using AI-assisted GitHub workflows across multiple vendors. The attack method relied on attacker-controlled issue or pull request content processed by the AI agent that could influence its tool use. On GitHub, a pull request allows developers to propose changes to a code repository and have those changes reviewed before they are approved and merged. According to Microsoft, attackers could use prompt injection attacks hidden in GitHub issues, pull requests, or comments to manipulate Claude Code into accessing files containing sensitive credentials. Claude Code is Anthropic's AI coding agent for software development tasks that launched in October.
Microsoft created a GitHub workflow and disguised malicious instructions behind content hosted on a domain it controlled to test the vulnerability. The approach allowed the researchers to bypass Claude's safety protections. The prompt injection attack tricked Claude into reading sensitive credentials and altering them to evade both Claude's safeguards and GitHub's secret-scanning tools. Microsoft stated an attacker could then reconstruct the credential and exfiltrate it through issue comments, workflow logs, web requests, or shell commands. Microsoft wrote that to bypass Sonnet's refusal safety mechanisms, the firm obscured the shell payload behind a response from its controlled domain. Microsoft also enabled the workflow to be triggered by users with no 'write' permissions to ensure Anthropic's environment variables scrub mitigations were active during tests.
Anthropic patched the flaw on May 5 with Claude Code version 2.1.128 after Microsoft disclosed the vulnerability through HackerOne on April 29. The tool drew scrutiny in March after Anthropic accidentally leaked more than 500,000 lines of its source code, exposing details of its internal architecture and prompting widespread analysis by researchers and developers. Despite multiple layers of built-in security controls, Microsoft found that a determined attacker could potentially manipulate an AI agent into exposing sensitive information.
Microsoft stated in its blog post that the industry is entering an era where natural language is executable code, and untrusted inputs like GitHub issues must be treated as hostile by default. The firm wrote that a single, carefully crafted comment combined with a misunderstood trust boundary is all it takes to walk away with production credentials. The report comes as prompt injection attacks have emerged as one of the biggest security threats facing AI agents. In a prompt injection attack, an attacker hides instructions in content such as emails, documents, websites, or code comments, causing an AI system to follow those instructions instead of the user's.
What vulnerability did Microsoft discover in Claude Code?
Microsoft researchers found that Anthropic's Claude Code GitHub Action could be manipulated through prompt injection attacks hidden in GitHub issues, pull requests, or comments, allowing attackers to expose credentials stored in software development pipelines.
When did Anthropic patch the Claude Code vulnerability?
Anthropic patched the vulnerability on May 5 with Claude Code version 2.1.128 after Microsoft disclosed the issue through HackerOne on April 29.
How did Microsoft test the Claude Code vulnerability?
Microsoft created a GitHub workflow and disguised malicious instructions behind content hosted on a domain it controlled, allowing researchers to bypass Claude's safety protections and trick the AI agent into reading and altering sensitive credentials.
Related News