PII Stripping

PII stripping is the automated detection and removal or tokenization of personally identifiable information from text before it is transmitted to a third-party service, preventing identity leakage at the data layer.

Definition

PII stripping is the process of automatically detecting and neutralizing personally identifiable information (PII)—names, email addresses, phone numbers, social security numbers, physical addresses, financial identifiers, and other data that can identify a specific individual—before that data is transmitted to a third-party processor. The PII is either removed entirely or replaced with reversible tokens that allow the original values to be restored on the client side after processing completes.

This is distinct from anonymization (which permanently destroys the link between data and identity) and pseudonymization (which replaces identifiers with consistent pseudonyms). PII stripping with tokenization is a round-trip operation: strip on the way out, re-inject on the way back. The third-party processor never sees the real data, but the user sees the complete, de-tokenized result.

Why It Matters

In 2024, over 17.2 billion records containing PII were exposed in data breaches, according to the IT Governance breach tracker. The average cost per compromised record containing PII reached $169 (IBM Cost of a Data Breach Report, 2024). Multiply these figures across the billions of prompts sent to AI providers every month, and the scale of the exposure becomes clear.

Every prompt sent to an LLM passes through infrastructure the user does not control. OpenAI’s data retention policy permits storage of API inputs for up to 30 days. Anthropic, Google, and other providers maintain similar windows. During that retention period, a prompt containing a user’s name, email, client details, or medical information sits on a server operated by a company the user has no relationship with, in a jurisdiction the user may not have chosen, subject to subpoenas the user will never learn about.

PII stripping eliminates this exposure at the source. If the PII never leaves the client, it cannot be retained, trained on, subpoenaed, or breached—regardless of the provider’s policies or security posture.

How It Works

PII stripping systems typically operate in four stages:

Detection: A named entity recognition (NER) model scans input text and identifies spans matching PII categories—personal names, emails, phone numbers, addresses, dates of birth, SSNs, credit card numbers, IP addresses, and medical record numbers. Modern NER systems use transformer-based architectures fine-tuned on PII-annotated datasets.
Tokenization: Each detected PII span is replaced with a unique token (e.g., [NAME_1], [EMAIL_2]). The mapping between tokens and original values is stored exclusively in client-side memory—never transmitted to the server.
Processing: The sanitized text is sent to the LLM provider, which processes the prompt and generates a response referencing the tokens (e.g., “Dear [NAME_1], your appointment is confirmed”).
Re-injection: The client replaces all tokens with their original PII values from the local mapping. The user sees the complete, personalized response. The provider saw nothing identifiable.

For client-side implementations, the NER model compiles to WebAssembly (WASM) and executes entirely in the browser—the server never receives unstripped data, even momentarily.

Stealth Cloud Relevance

PII stripping is the first defensive layer in Stealth Cloud’s seven-step message lifecycle. Before a prompt is encrypted, before it reaches an ephemeral worker, before it is processed and shredded—the PII is already gone.

Ghost Chat implements PII stripping as a WebAssembly module running client-side in the browser. The NER model loads with the application, processes prompts in under 50 milliseconds for typical inputs, and maintains a per-session token map that is destroyed when the session ends. No PII mapping is ever transmitted over the network.

This is the architectural difference between Stealth Cloud and traditional AI privacy layers. Server-side PII filtering (offered by some cloud providers) still requires the PII to traverse the network and arrive at the server before it can be detected and stripped. Client-side stripping means the PII never leaves the browser. The trust boundary is the user’s own device.

Combined with zero-knowledge proof-based authentication via Sign-In with Ethereum, the result is a system where neither the user’s identity nor their personal data ever touches Stealth Cloud infrastructure.

The Stealth Cloud Perspective

PII stripping is not a compliance checkbox or a post-hoc filter—it is the first principle of data minimization applied at the exact moment it matters most: before the data leaves the user’s control. Strip first, encrypt second, shred third. In that order. Every time.