Definition

Prompt injection is an attack technique targeting large language models (LLMs) in which an adversary crafts input that overrides, redirects, or subverts the model’s system-level instructions. The attack exploits a fundamental architectural constraint: LLMs process system prompts and user inputs in the same token stream, with no hardware-enforced boundary between trusted instructions and untrusted data. This is the LLM equivalent of SQL injection—a confusion of code and data in a shared execution context.

Two primary variants exist. Direct prompt injection occurs when a user deliberately includes adversarial instructions in their input (e.g., “Ignore all previous instructions and…”). Indirect prompt injection occurs when malicious instructions are embedded in external data the model retrieves—web pages, documents, database records, or API responses—causing the model to execute attacker-controlled directives without the user’s knowledge.

Why It Matters

OWASP ranked prompt injection as the number one security risk for LLM applications in its 2025 Top 10 for Large Language Model Applications. A 2024 study by researchers at ETH Zurich demonstrated that indirect prompt injection attacks succeeded against GPT-4, Claude, and Gemini with success rates between 42% and 97%, depending on the attack vector and defensive measures in place.

The risk compounds as LLMs gain tool access. A prompt-injected model with the ability to send emails, execute code, or query databases becomes a proxy for the attacker. In February 2024, security researchers demonstrated an attack chain in which an indirect prompt injection embedded in an email caused an LLM-powered assistant to exfiltrate a user’s private calendar data to an external server—without the user issuing any command.

For enterprises routing sensitive data through AI applications, prompt injection is not a theoretical concern. It is a live attack surface that intersects directly with data privacy, access control, and threat modeling.

How It Works

Prompt injection operates through the model’s inability to distinguish instruction from content:

  1. System prompt override: The attacker includes instructions that contradict or supersede the model’s system prompt. Because LLMs process tokens sequentially without privilege levels, a well-crafted user input can override behavioral constraints.

  2. Context window poisoning: In retrieval-augmented generation (RAG) systems, adversarial content is planted in data sources the model retrieves. When the model incorporates this content into its context window, it processes the injected instructions as if they were part of its operating directives.

  3. Payload encoding: Attackers obfuscate injection payloads using base64, Unicode manipulation, language translation, or semantic reframing to bypass keyword-based filters and safety classifiers.

Defensive strategies include input validation, output filtering, instruction-data separation (placing system prompts in privileged positions), canary tokens, and constitutional AI approaches. None provide complete mitigation. The vulnerability is architectural: until LLMs enforce a privilege boundary between instructions and data, prompt injection remains an inherent risk.

Stealth Cloud Relevance

Stealth Cloud addresses prompt injection through a defense-in-depth approach anchored by PII stripping. Before any prompt reaches an LLM provider, the client-side WASM engine tokenizes and removes personally identifiable information. Even if a prompt injection attack succeeds in manipulating model behavior, the model operates on sanitized data—there is no PII to exfiltrate.

This architectural boundary matters. Ghost Chat’s zero-knowledge design ensures that the LLM provider never sees the user’s identity, IP address, or raw personal data. The attack surface for indirect prompt injection is further reduced because Ghost Chat does not implement RAG—there are no external data sources for adversaries to poison.

The threat model for Stealth Cloud treats the LLM provider as a potentially compromised component. This assumption—central to zero trust architecture—means that even successful prompt injection against the model does not compromise user privacy, because the model was never given meaningful data to leak.

The Stealth Cloud Perspective

Prompt injection exploits a boundary that does not exist inside the model. Stealth Cloud compensates by building the boundary outside the model—stripping PII before it enters the context window and treating every LLM response as untrusted output. The model cannot leak what it was never given.