Google’s critical warning on indirect prompt injections targeting 1.8 billion Gmail users – Mountain Theory

Google has issued comprehensive security warnings about a sophisticated new attack vector threatening its 1.8 billion Gmail users worldwide. These indirect prompt injection attacks exploit fundamental vulnerabilities in AI language models, allowing attackers to manipulate Google’s Gemini AI assistant through hidden instructions embedded in emails. Unlike traditional phishing that relies on malicious links or attachments, these attacks weaponize the AI itself, transforming trusted assistants into unwitting accomplices in credential theft, data exfiltration, and social engineering campaigns.

The discovery by Mozilla’s 0Din bug bounty program demonstrates that attackers can embed invisible instructions using simple HTML techniques like zero-size fonts or white text on white backgrounds. When users click “Summarize this email” in Gmail, Gemini processes these hidden commands as legitimate instructions, potentially displaying fake security warnings or extracting sensitive information. This represents a paradigm shift in email security – traditional security gateways that scan for malicious URLs and attachments are completely blind to these attacks, as the malicious payload exists only as hidden text that the AI interprets.

How indirect prompt injections exploit AI assistants like Gemini

The technical architecture of large language models creates an inherent vulnerability that makes these attacks possible. LLMs process all text input as a continuous prompt without clear boundaries between trusted system instructions and potentially malicious external content. When Gemini summarizes an email, it cannot reliably distinguish between Google’s legitimate instructions and attacker-embedded commands hidden in the email body.

The attack chain follows a deceptively simple pattern. First, attackers craft emails containing hidden instructions using HTML steganography techniques, text rendered invisible through CSS properties like font-size:0px or color:# ffffff on white backgrounds. These emails pass through traditional security filters undetected since they contain no suspicious attachments or URLs. When the recipient uses Gemini’s summarization feature, the AI processes the entire email content, including the hidden instructions. The model then executes these instructions, which can range from displaying fake security alerts to reconstructing obfuscated malicious links.

The vulnerability extends beyond simple text hiding. Researchers have demonstrated attacks using Unicode tag characters in the 0xE0000 to 0xE007F range, which remain invisible to users but are reconstructed by the tokenizer during processing. More sophisticated attacks employ what researchers call “many-shot jailbreaking,” exploiting extended context windows to gradually override safety instructions through repetition and reinforcement.

HiddenLayer’s research revealed that a single “Policy Puppetry” attack template achieves a 99.8% success rate across all major LLMs, including ChatGPT, Claude, and Gemini, using structured data formats like XML or JSON to masquerade malicious instructions as system policies. This universal effectiveness demonstrates that the vulnerability isn’t specific to Google’s implementation but represents a fundamental challenge in current AI architecture.

Critical differences between direct and indirect prompt injection attacks

Understanding the distinction between direct and indirect prompt injection is crucial for developing effective defenses. Direct prompt injection occurs when users intentionally manipulate AI systems through the primary interface – like attempting to jailbreak ChatGPT with “ignore previous instructions” commands. These attacks require direct interaction with the AI system, making them easier to monitor and filter through input validation.

Indirect prompt injection fundamentally changes the threat model by eliminating the need for direct attacker interaction with the AI system. Instead, attackers manipulate third-party content that the AI will eventually process. In Gmail’s case, the attack vector is email content that appears benign to both users and traditional security tools. The attacker never directly interacts with Gemini – they simply send an email and wait for the victim to request a summary.

This indirection creates several strategic advantages for attackers. The malicious payload can remain dormant indefinitely until triggered by user action, bypassing time-based security scans. One compromised email can affect multiple users if forwarded or shared. The attack surface expands dramatically as it includes any external content the AI might process – emails, documents, calendar invites, or shared files. Most critically, traditional security boundaries become meaningless when the attack vector is text that only becomes malicious in the context of AI processing.

The persistence and scalability of indirect attacks pose particular challenges for enterprise security. A single compromised newsletter subscription could become what researchers call “thousands of phishing beacons,” continuously delivering malicious prompts to all subscribers. Supply chain attacks become trivial when any text-based communication channel can carry hidden AI instructions.

Real-world attacks demonstrate escalating threat severity

The evolution from theoretical vulnerability to operational weapon has occurred with alarming speed. Mozilla’s GenAI Bug Bounty Programs Manager, Marco Figueroa, successfully demonstrated that attackers can embed instructions causing Gemini to display fabricated security warnings, complete with fake support phone numbers. When users click “Summarize this email,” they receive what appears to be an urgent security alert from Google, complete with reference numbers and callback instructions.

The sophistication of active threat groups has increased dramatically. Scattered Spider (UNC3944), responsible for over £1 billion in damages to UK retailers in 2025, has integrated AI manipulation into their standard toolkit. Their campaigns against Marks & Spencer, Co-op, and Harrods demonstrate how prompt injection amplifies traditional social engineering. The group uses AI to generate convincing personas, create deepfake videos for identity verification, and craft highly targeted phishing content that adapts based on victim responses.

SafeBreach researchers revealed even more concerning attack vectors through their “Targeted Promptware” research. They demonstrated that prompt injection could enable smart home control, allowing attackers to manipulate IoT devices through AI assistants. Calendar-based attacks can schedule malicious meetings or exfiltrate schedule information. Document sharing attacks through Google Drive can hijack collaborative sessions and spread to other users. The potential for “AI worms” that self-replicate across email systems represents an existential threat to enterprise communications.

The financial impact has been severe. A single UK retailer reported losses exceeding £300 million from AI-enhanced attacks in 2025. Healthcare organizations face particular risk, with 56% of tested AI models proving susceptible to prompt injection, potentially exposing patient data and violating HIPAA regulations. The emergence of “Skynet” malware in June 2025 – the first malware attempting to evade AI-based security tools through prompt injection – signals that attackers are now weaponizing AI against AI defenses.

Google implements multilayered defenses with acknowledged limitations

Google’s response demonstrates both the seriousness of the threat and the fundamental challenges in addressing it. The company has deployed a five-layer defense strategy that represents current best practices in AI security, yet openly acknowledges that complete protection remains impossible with current technology.

The first line of defense employs proprietary machine learning classifiers trained to detect prompt injection attempts. These models analyze incoming content for patterns associated with hidden instructions, drawing from Google’s extensive AI Vulnerability Reward Program dataset. However, adaptive attackers continuously evolve techniques to bypass detection, creating an endless cat-and-mouse game.

Security thought reinforcement surrounds untrusted content with targeted instructions designed to maintain the AI’s focus on legitimate tasks. This approach attempts to create context boundaries that prevent the model from following embedded commands. Markdown sanitization and URL redaction prevent the reconstruction of malicious links, while blocking external image rendering stops “EchoLeak” vulnerabilities where instructions hide in image metadata.

The human-in-the-loop framework requires explicit user confirmation for sensitive actions, preventing autonomous execution of potentially harmful operations. When the system detects possible injection attempts, it displays contextual security warnings to alert users. Despite these measures, Google reports reducing attack success rates from 99.8% to only 53.6% with their latest Gemini 2.5 model – a significant improvement but far from complete protection.

The limitations are sobering. Google initially classified some reported vulnerabilities as “intended behavior,” acknowledging that certain attack vectors cannot be addressed without fundamentally breaking AI functionality. The company admits they have found “no evidence of incidents manipulating Gemini” in real-world attacks but states the technique “remains viable today.” They are “mid-deployment on several updated defenses,” indicating this is an active area of ongoing development rather than a solved problem.

Industry experts warn of unprecedented security paradigm shift

The cybersecurity community’s assessment reveals deep concern about the implications of prompt injection vulnerabilities. OWASP ranks prompt injection as the #1 threat in their Top 10 for LLM Applications, describing it as “unpatchable” because it exploits fundamental design principles rather than implementation bugs. This classification has remained unchanged from 2023 to 2025, despite significant research investment, suggesting the intractability of the problem.

Leading security researchers characterize indirect prompt injection as a paradigm shift comparable to the introduction of SQL injection or cross-site scripting vulnerabilities, but potentially more dangerous due to AI’s expanding role in critical business functions. The attack surface grows exponentially as organizations integrate AI into more workflows – IBM reports that 84% of CEOs fear catastrophic AI-related attacks, yet only 5% feel highly confident in their security preparedness.

Expert analysis reveals concerning adoption-security gaps. Organizations with high “shadow AI” usage – unauthorized AI tool deployment – experience average breach costs $670,000 higher than those with managed AI adoption. Yet only 37% have policies to detect or manage shadow AI, even as usage increased 594% from 2023 to 2024. This explosive growth outpaces security team capabilities, creating what experts call an “AI security debt” that compounds with each new deployment.

The business impact extends beyond direct financial losses. Legal precedent from the Air Canada chatbot case established corporate liability for AI-generated misinformation, adding regulatory risk to the threat landscape. Reputational damage from AI-related incidents can be severe – Google’s Bard demonstration error caused a $100 billion stock value loss, illustrating how AI vulnerabilities can trigger market-wide consequences.

Broader AI ecosystem reveals systemic vulnerabilities requiring fundamental reimagination

The Gmail prompt injection threat represents just one manifestation of deeper architectural vulnerabilities affecting the entire AI ecosystem. These weaknesses interconnect across platforms, models, and applications, creating cascading risks that traditional security frameworks cannot address.

The supply chain attack surface has expanded dramatically. Research shows 62% of organizations have deployed AI packages containing at least one known CVE, often without awareness. Training data poisoning allows attackers to embed vulnerabilities that persist through model updates and fine-tuning. The Pravda disinformation network created 3.6 million articles specifically to influence AI responses, with chatbots echoing false narratives 33% of the time, demonstrating how information pollution becomes embedded in AI knowledge bases.

Enterprise AI adoption amplifies these risks through integration complexity. Multi-modal AI systems accepting text, image, and audio inputs create cross-modal attack vectors where malicious instructions hide in unexpected formats. Retrieval-augmented generation (RAG) systems introduce additional vulnerabilities through document stores and embedding databases. Agentic AI with real-world permissions, like systems that can send emails, modify calendars, or execute code – transform prompt injection from information manipulation to direct action.

The regulatory landscape struggles to keep pace. While the EU AI Act and NIST’s AI Risk Management Framework provide guidance, enforcement mechanisms remain undefined. Organizations face a patchwork of requirements across jurisdictions, with potential liability for both AI deployment and non-deployment decisions. The intersection with privacy regulations like GDPR adds complexity when AI systems process personal data in unpredictable ways.

Defensive strategies must evolve beyond traditional cybersecurity approaches. Zero-trust architectures need adaptation for AI systems where the boundary between trusted and untrusted dissolves within the model itself. Organizations are implementing AI-specific security operations centers (AI-SOCs) with specialized tools for prompt analysis, model behavior monitoring, and adversarial testing. Industry leaders advocate for “defensive AI” – using AI systems to protect against AI attacks – though this creates recursive security challenges.

Looking forward, experts predict that by 2026, the majority of advanced cyberattacks will employ AI for dynamic, multilayered campaigns. Autonomous malware capable of real-time adaptation will require equally sophisticated defenses. The emergence of quantum computing adds urgency, as current encryption protecting AI training data and model weights will become vulnerable. Organizations must begin transitioning to post-quantum cryptography while simultaneously addressing immediate prompt injection threats, creating what researchers call “the AI security paradox” – needing AI to defend against AI while knowing AI itself is fundamentally vulnerable.