Every technology choice in a privacy-first architecture is a trust decision. A framework, a database, a runtime, a protocol — each one either preserves or undermines the guarantees the system is designed to provide. An architecture that claims zero persistence but uses a database with write-ahead logging has made a contradictory decision. An architecture that claims zero knowledge but stores encryption keys on the server has defeated its own purpose.

This document explains every major technology decision in the Stealth Cloud stack: what we chose, what we rejected, and the privacy reasoning behind each choice. The decisions are not theoretical — they are the production architecture powering Ghost Chat, our ephemeral AI chat interface.

Frontend: Next.js 15, React 19, Tailwind CSS

The Choice

  • Next.js 15 with App Router for the frontend framework.
  • React 19 for the component model.
  • Tailwind CSS for styling.

The Reasoning

Next.js provides server-side rendering (SSR) and static generation (SSG) capabilities, but our use of it is deliberate: the landing pages and documentation are statically generated (zero server-side computation for marketing content), while the Ghost Chat application uses client-side rendering with minimal server interaction.

The critical property is client-side encryption. The React application manages the entire encryption lifecycle in the browser: key generation, payload encryption, response decryption, PII detection, and token mapping. No sensitive operation is delegated to a server component. Server-side rendering of chat content would require the server to possess the decryption keys — which contradicts zero-knowledge.

Rejected alternatives: SvelteKit, Solid, and Qwik offer smaller bundles but smaller ecosystems for cryptographic library compatibility. Remix excels at server-side patterns irrelevant to our client-heavy architecture. Vanilla JS lacks the component model needed for the chat interface’s simultaneous handling of encryption, PII processing, and real-time state.

Tailwind CSS produces utility-class-based styles that are tree-shaken at build time, resulting in minimal CSS payload. For a privacy application, minimizing asset size is not vanity — it reduces page load time, which reduces the window between navigation and encryption initialization.

Edge Runtime: Cloudflare Workers with Hono

The Choice

  • Cloudflare Workers as the compute runtime.
  • Hono as the API routing framework.
  • Cloudflare KV for TTL-based ephemeral storage.
  • Durable Objects for WebSocket session coordination.

The Reasoning

The selection of Cloudflare Workers is detailed in our dedicated analysis. The summary: Workers provide the only commercial serverless runtime with zero filesystem access, zero default logging, sub-5 ms cold start, global edge distribution across 330+ locations, and V8 isolate sandboxing with zero system call surface.

Hono is a lightweight web framework (14 KB) designed for edge runtimes. It provides Express-like routing semantics with middleware support, TypeScript-first design, and zero Node.js dependencies. Unlike Express (which depends on Node.js APIs unavailable in Workers) or Fastify (which requires Node.js’s http module), Hono runs natively on the Workers runtime with no polyfills or compatibility shims.

Our Hono middleware stack:

  1. CORS middleware. Restricts origins to stealthcloud.ai and app.stealthcloud.ai.
  2. Rate limiter. Per-wallet-hash rate limiting using Durable Objects for distributed state.
  3. Auth middleware. Validates JWT from httpOnly cookie, extracts wallet address hash.
  4. Request sanitizer. Strips non-essential headers, validates content type, enforces payload size limits.
  5. Zero-log middleware. Explicitly suppresses any runtime logging. No console.log calls in production. No Logpush configuration.

Cloudflare KV provides globally distributed key-value storage with configurable TTL. We use KV exclusively for ephemeral data with aggressive TTLs:

  • Session nonces: 5-minute TTL. Used once for SIWE verification, then deleted.
  • Rate limit counters: 1-hour TTL. Track request counts per wallet hash.
  • Model configuration cache: 15-minute TTL. Caches available LLM model metadata.

No user data — no prompts, no responses, no conversation history — is written to KV. The architecture enforces this at the application level, but the TTL guarantees that even if a bug wrote data to KV, it would be automatically deleted within the configured TTL.

Durable Objects coordinate real-time WebSocket sessions for streaming LLM responses. Each chat session creates a Durable Object that:

  • Accepts the client’s WebSocket connection.
  • Maintains the session’s model selection and burn timer configuration in memory.
  • Relays encrypted SSE chunks from the LLM provider to the client via WebSocket.
  • Destroys itself when the session terminates (explicit burn, burn timer expiry, or WebSocket close).

The Durable Object’s transactional storage is not used for conversation content. It stores only session metadata (model ID, burn timer duration, creation timestamp) with a TTL that matches the session’s maximum lifetime.

Rejected alternatives:

  • AWS Lambda / API Gateway: Lambda has a writable /tmp filesystem, default CloudWatch logging, and cold starts 40-400x slower than Workers. API Gateway adds logging, metrics, and request/response buffering that create persistence surfaces.
  • Google Cloud Functions / Cloud Run: Default Cloud Logging captures invocation data. gVisor provides good isolation but has a filesystem (tmpfs). Cold starts are 20-200x slower than Workers.
  • Vercel Edge Functions: V8 isolate-based (good), but more limited KV and Durable Object equivalents. Vercel’s platform is built around Next.js deployment, not general-purpose API infrastructure.
  • Deno Deploy: V8 isolate-based, strong security model, but smaller edge network (35 regions vs. Cloudflare’s 330+) and less mature persistence primitives.

Authentication: Sign-In with Ethereum (SIWE)

The Choice

  • EIP-4361 (SIWE) for authentication.
  • wagmi / viem for wallet interaction on the client.
  • Hashed wallet addresses in JWTs (never full addresses).

The Reasoning

Traditional authentication requires an identity: an email address, a phone number, a username. Each of these is personally identifiable information (PII). Storing authentication credentials — even hashed passwords — creates a user database that links identities to activity.

SIWE eliminates this entirely. The authentication flow:

  1. Client requests a nonce from the Worker (GET /auth/nonce).
  2. Client constructs a SIWE message containing the nonce, the domain, and the wallet address.
  3. Client signs the message with the wallet’s private key (MetaMask, WalletConnect, Coinbase Wallet, Rainbow).
  4. Client sends the signed message to the Worker (POST /auth/verify).
  5. Worker performs ecrecover to verify the signature, confirming the client controls the claimed wallet address.
  6. Worker issues a JWT (httpOnly, Secure, SameSite=Strict cookie) containing a SHA-256 hash of the wallet address — never the full address.

The JWT has a 1-hour TTL. No refresh tokens. No session database. The JWT is self-contained and verifiable at the edge without a round-trip to a central auth server.

Why hashed wallet addresses: The wallet address itself is a public identifier on the Ethereum blockchain. Storing the full address in the JWT would allow correlation between Stealth Cloud sessions and on-chain activity. The SHA-256 hash preserves the ability to identify the same wallet across sessions (for rate limiting) without revealing the wallet address itself.

Rejected alternatives:

  • OAuth / OpenID Connect: Requires a third-party identity provider (Google, GitHub, Apple). The identity provider knows that the user authenticated with Stealth Cloud. This creates a metadata trail connecting the user’s identity to their use of a privacy service.
  • Email/password: Creates a user database. Requires email storage. Links PII to activity.
  • Magic links / passwordless email: Requires email storage and creates a record of the authentication event in the user’s email inbox.
  • Passkeys / WebAuthn: Strong cryptographic authentication but typically tied to a device-specific credential. Does not provide the pseudonymity of wallet-based auth, and the relying party (our server) must store a credential ID.
  • Anonymous session tokens: No identity at all — but also no rate limiting, no abuse prevention, and no ability for the user to authenticate across devices.

SIWE provides the precise combination required: cryptographic proof of identity without PII, pseudonymous but consistent identity for rate limiting, and no server-side credential storage.

Encryption: AES-256-GCM via Web Crypto API

The Choice

  • AES-256-GCM for symmetric encryption of message content.
  • Web Crypto API for all cryptographic operations.
  • Client-side key generation and management.

The Reasoning

AES-256-GCM provides authenticated encryption with associated data (AEAD). The “authenticated” property is critical: GCM mode produces a 128-bit authentication tag that verifies both the integrity and authenticity of the ciphertext. If an attacker modifies any bit of the ciphertext, the authentication tag verification fails, and the decryption is rejected. This prevents tampering — a man-in-the-middle cannot alter the encrypted message without detection.

The Web Crypto API is a W3C standard implemented in all modern browsers and in the Cloudflare Workers runtime. It provides hardware-accelerated AES-GCM on platforms with AES-NI instruction support (all modern x86 CPUs, Apple Silicon, most ARM server processors). Key operations:

Key operations include generateKey(), encrypt(), decrypt(), exportKey() (for session key exchange during SIWE handshake), and destroyKey() (for key destruction on session termination).

The key lifecycle is strict: a new key is generated per session using crypto.getRandomValues(), exchanged with the Worker during the SIWE handshake, used with a fresh 96-bit IV per message, and destroyed via crypto.subtle.destroyKey() when the session ends. The Worker’s copy is simultaneously destroyed in the Durable Object’s memory.

Rejected alternatives:

  • RSA encryption: Asymmetric encryption is significantly slower than AES-GCM for message-level encryption. RSA-OAEP encryption of a 1 KB message takes approximately 2 ms vs. approximately 0.1 ms for AES-256-GCM. For real-time chat with streaming responses, the latency difference matters.
  • ChaCha20-Poly1305: A strong alternative to AES-GCM, particularly on devices without AES-NI hardware acceleration (older mobile devices). ChaCha20-Poly1305 is faster in software but slower when AES-NI is available. Since our primary target (modern browsers on modern hardware) universally supports AES-NI, AES-256-GCM is the performance-optimal choice. ChaCha20-Poly1305 may be offered as a fallback for hardware without AES-NI in a future release.
  • Homomorphic encryption: Would allow the server to process encrypted data without decryption. The performance overhead (10,000-1,000,000x for current FHE schemes) makes real-time chat infeasible. This is a Phase 4+ aspiration.

PII Engine: Client-Side WebAssembly

The Choice

  • WebAssembly (WASM) PII detection module running in the browser.
  • Named Entity Recognition (NER) model compiled to WASM for inference.
  • Token replacement with client-side mapping table.

The Reasoning

PII stripping must happen before data leaves the client. This is a non-negotiable architectural requirement: if PII reaches the server, the server has seen PII, and the zero-knowledge guarantee is weakened.

The PII engine operates as follows:

  1. The user types a prompt in the Ghost Chat interface.
  2. Before encryption, the WASM PII engine scans the plaintext for personally identifiable information: names, email addresses, phone numbers, physical addresses, social security numbers, credit card numbers, dates of birth, and other PII categories.
  3. Detected PII entities are replaced with tokens: [NAME_1], [EMAIL_1], [PHONE_1], etc.
  4. The token mapping ([NAME_1] -> actual name) is stored in browser memory — never transmitted to any server.
  5. The tokenized prompt is encrypted with AES-256-GCM and sent to the Worker.
  6. The Worker decrypts the prompt (in the isolate’s memory), verifies that no obvious PII patterns remain (a second-pass regex check as defense-in-depth), and forwards the sanitized prompt to the LLM provider.
  7. The LLM response (which may reference [NAME_1] etc.) is returned to the client.
  8. The client decrypts the response and re-injects the PII: [NAME_1] is replaced with the actual name for display.

The WASM module runs at near-native speed in the browser. NER inference on a typical prompt (50-200 tokens) completes in under 10 ms on modern hardware, adding negligible latency to the user experience.

Rejected alternatives:

  • Server-side PII stripping: Requires sending plaintext PII to the server. Defeats zero-knowledge.
  • Regex-only PII detection: Pattern matching catches structured PII (phone numbers, SSNs, email addresses) but misses unstructured PII (names, addresses in free text). NER models provide significantly higher recall for unstructured PII.
  • API-based NER (Presidio, Comprehend): Requires sending text to an external API for PII detection. The external API sees the plaintext including PII — creating the exact exposure the PII engine is designed to prevent.

LLM Provider Adapters: OpenAI, Anthropic, Together, Groq

The Choice

  • Multiple LLM provider support with per-session model selection.
  • Cloudflare AI Gateway as the routing and rate-limiting layer.
  • Provider-agnostic adapter pattern for consistent API translation.

The Reasoning

Stealth Cloud is not an AI company. We do not train models, fine-tune models, or host models. We are an infrastructure layer that interposes a privacy boundary between the user and the AI provider. Supporting multiple providers gives users choice and prevents lock-in to a single provider’s data practices.

Each adapter handles API translation (converting to provider-specific formats), SSE streaming, and error normalization — with zero content logging. Cloudflare AI Gateway provides rate limiting and aggregate analytics (request counts, latency percentiles) between the Worker and the providers, with request caching explicitly disabled.

Rejected alternatives:

  • Self-hosted models only: Would eliminate the third-party trust dependency but limit model quality. GPT-4, Claude, and other frontier models are not available for self-hosting. Edge-inferenced open models (Llama, Mistral via Cloudflare Workers AI) are offered as an option but are not a substitute for frontier model quality.
  • Single provider: Lock-in to one provider’s data practices, pricing, and availability. Provider diversification is a privacy strategy, not just a business strategy — see our analysis of multi-cloud privacy.

Session Management: Ephemeral by Design

The Choice

  • JWT-based authentication with 1-hour TTL.
  • Durable Object per session for WebSocket coordination.
  • Burn timers with client-configurable TTLs (5 minutes to 24 hours).
  • Cryptographic shredding on session termination.

The Reasoning

Sessions are ephemeral state. They exist for the duration of a conversation and are destroyed when the conversation ends. The destruction is not a soft delete — it is cryptographic shredding: the encryption keys that protected the session are destroyed, making any captured ciphertext permanently undecryptable.

The termination sequence is total: the client destroys its session key via crypto.subtle.destroyKey(), sends a burn command to the Worker, the Durable Object zeroes its memory and deletes all associated KV entries, WebSocket connections close, the DOM is cleared, and the JavaScript heap is garbage-collected. No artifact of the conversation persists — no key, no ciphertext, no plaintext, no session metadata.

Context Bomb (POST /chat/bomb) extends this to all sessions for a wallet hash: every Durable Object associated with the wallet hash is destroyed simultaneously. This is the nuclear option — destroy everything, not just the current session.

No Traditional Database

The Choice

  • No PostgreSQL, MySQL, MongoDB, or any persistent relational/document database.
  • Cloudflare KV (TTL-based key-value) for ephemeral metadata.
  • Cloudflare Durable Objects (in-memory + optional transactional storage) for session coordination.
  • Cloudflare R2 (S3-compatible object storage) for static assets only (landing page images, WASM modules).

The Reasoning

A database is a persistence mechanism. Persistence is the antithesis of zero-persistence architecture. Any traditional database — even one configured for short retention — creates replication logs, write-ahead logs (WAL), point-in-time recovery archives, and backup snapshots. Each of these is a copy of the data that persists beyond the intended retention period.

PostgreSQL’s WAL logs every write operation before data reaches the main files. Deleting a row does not delete it from the WAL. Forensic recovery of “deleted” data from WAL archives is straightforward. By not using a database, we eliminate WAL archives, replication streams, automated backups, and point-in-time recovery — every mechanism that makes deleted data recoverable.

KV with TTL provides the bounded persistence we need: a value stored with a 5-minute TTL is guaranteed to be deleted within 60 seconds of the TTL expiry (Cloudflare’s documented deletion guarantee). There is no WAL. There is no backup. There is no replication log that retains the value after deletion.

The Stealth Cloud Perspective

This tech stack is not a list of fashionable technologies assembled to signal engineering sophistication. It is a set of decisions, each made to serve a single architectural requirement: the system must be structurally incapable of retaining user data, identifying users, or decrypting user content.

Every component was selected for what it cannot do as much as for what it can. Workers cannot write to disk. SIWE cannot collect email addresses. AES-256-GCM cannot be decrypted without the key. The WASM PII engine cannot phone home. KV cannot retain data past its TTL. And when the session ends, destroyKey() cannot be reversed.

The stack is opinionated. It rejects databases, rejects container runtimes, rejects traditional auth, and rejects centralized compute. Each rejection is a trust decision: we reject the components that would require the user to trust us, and we select the components that make our trustworthiness irrelevant. The user’s privacy does not depend on our good intentions. It depends on our architecture — and the architecture is the stack.

This is the zero-trust principle applied to our own infrastructure. We do not trust ourselves. We built a system that does not need to.