Federated Learning

Federated learning is a machine learning paradigm in which a model is trained across multiple decentralized devices or servers holding local data samples, without exchanging or centralizing the raw data itself.

Definition

Federated learning (FL) is a distributed machine learning approach in which multiple participants collaboratively train a shared model without transferring their raw data to a central server. Each participant trains a local copy of the model on their own data, computes updates (typically gradient vectors), and sends only those updates to a central aggregator. The aggregator combines the updates, improves the global model, and distributes the new version back to participants. The raw data never leaves its origin device.

The concept was formalized by Google in a 2016 paper by McMahan et al. under the term “Federated Averaging” (FedAvg), initially deployed for next-word prediction on Android’s Gboard keyboard. It has since expanded to healthcare, finance, telecommunications, and any domain where data sensitivity or regulatory constraints prohibit centralization.

Why It Matters

The machine learning industry is built on a contradiction. Model performance scales with data volume, yet the data most valuable for training—medical records, financial transactions, private communications—is the data most regulated and most dangerous to centralize. A 2025 report by Gartner estimated that 60% of large organizations will use privacy-enhancing computation techniques, including federated learning, for processing data in untrusted environments by 2027.

The centralization problem is not theoretical. In 2023, the FTC fined Amazon $25 million for retaining children’s voice recordings collected through Alexa, data that was used for model training without adequate consent or deletion mechanisms. The traditional ML pipeline—collect data, centralize it, train on it—creates honeypots of sensitive information that attract both regulators and attackers.

Federated learning dissolves the honeypot. If data never leaves the device, there is no central repository to breach, no data transfers to audit, and no retention policies to violate. The model improves. The data stays home.

How It Works

A federated learning cycle operates in four phases:

Initialization: The central server distributes a global model (or model architecture) to participating devices or nodes. Each node holds its own local dataset that it will never share.
Local training: Each participant trains the global model on its local data for a fixed number of epochs or steps. This produces a set of model updates—gradients or weight deltas—that encode what the node’s data “teaches” the model.
Secure aggregation: Participants send their model updates (not their data) to the aggregation server. To prevent the server from inferring information about individual participants, updates are often encrypted using secure multi-party computation or perturbed with differential privacy noise.
Global update: The server aggregates all received updates (e.g., by computing a weighted average) to produce an improved global model, which is distributed back to participants for the next round.

The key vulnerability in naive federated learning is gradient inversion: an attacker who observes a participant’s model updates can sometimes reconstruct the underlying training data. This is why production federated learning systems pair FL with differential privacy (adding noise to gradients) and secure aggregation (encrypting updates so the server sees only the aggregate).

Stealth Cloud Relevance

Stealth Cloud does not train models on user data—period. Ghost Chat routes prompts through a PII-stripping layer and forwards sanitized text to third-party LLM providers. No training signal is extracted. No conversation data is retained. This is a deliberate architectural choice documented in the Stealth Cloud Manifesto: zero persistence means zero training.

However, federated learning represents an important design influence on Stealth Cloud’s philosophy. The core insight—that computation should move to the data, not data to the computation—mirrors Stealth Cloud’s client-side architecture. PII detection runs in WebAssembly on the user’s device. Encryption keys are generated and destroyed in the browser. The heaviest privacy-critical operations execute locally, where the user maintains control.

If Stealth Cloud ever incorporates on-device model capabilities—such as local PII detection model improvement or personalized inference—federated learning with differential privacy guarantees would be the only acceptable training paradigm. The zero-knowledge principle forbids any alternative.

The Stealth Cloud Perspective

Federated learning proved that AI does not require data centralization. Stealth Cloud extends this principle to its logical endpoint: AI does not require data retention at all—not on the server, not in the training pipeline, not anywhere the user did not explicitly choose.