Garbled Circuits: The Foundation of Secure Two-Party Computation

A deep technical breakdown of garbled circuits — Andrew Yao's foundational technique for secure two-party computation. From gate-level garbling and oblivious transfer to free-XOR optimizations, half-gates, and real-world performance in private set intersection and secure auctions.

In 1986, Andrew Yao proposed a thought experiment that became a foundational protocol in cryptography: two millionaires want to determine who is richer without revealing their actual wealth to each other. The solution – garbled circuits – enables two parties to jointly compute any function on their private inputs without revealing those inputs to each other.

Garbled circuits are not a theoretical curiosity. They are the basis of practical secure computation deployed in production systems. Google uses garbled circuits in their Private Join and Compute protocol for cross-organization data analysis. Apple’s Private Relay uses related techniques for anonymous browsing. The IRS and census bureaus have evaluated garbled circuit protocols for computing statistics on tax data without exposing individual records. A 2023 benchmark by the Bristol Cryptography Group demonstrated that modern garbled circuit implementations can evaluate AES-128 encryption in 0.03 seconds – fast enough for real-time applications.

The technique works at the level of Boolean logic gates. Any computable function can be expressed as a circuit of AND, OR, XOR, and NOT gates. Garbled circuits encrypt this circuit so that it can be evaluated without revealing intermediate values.

The Protocol at Gate Level

The Players

Two parties: Alice (the “garbler”) and Bob (the “evaluator”). Alice has input x, Bob has input y. They want to compute f(x, y) without Alice learning y or Bob learning x.

Step 1: Circuit Representation

The function f is expressed as a Boolean circuit – a directed acyclic graph of logic gates. Each wire in the circuit carries a single bit (0 or 1). Each gate takes two input wires and produces one output wire.

For example, comparing two 32-bit numbers requires approximately 200 AND gates and 200 XOR gates. Computing AES-128 requires approximately 6,800 AND gates and 25,000 XOR gates. More complex functions require proportionally larger circuits.

Step 2: Garbling the Circuit (Alice’s Work)

For each wire w in the circuit, Alice generates two random labels: K_w^0 (representing the bit value 0 on wire w) and K_w^1 (representing the bit value 1). These labels are random 128-bit strings that reveal nothing about the bit they represent.

For each gate, Alice constructs a garbled truth table. Consider an AND gate with input wires a, b and output wire c:

a	b	c = a AND b
0	0	0
0	1	0
1	0	0
1	1	1

Alice encrypts each output label under the corresponding pair of input labels:

Enc(K_a^0, K_b^0, K_c^0)  -- row 0,0 -> 0
Enc(K_a^0, K_b^1, K_c^0)  -- row 0,1 -> 0
Enc(K_a^1, K_b^0, K_c^0)  -- row 1,0 -> 0
Enc(K_a^1, K_b^1, K_c^1)  -- row 1,1 -> 1

The encryption uses a double-key cipher: Enc(K_a, K_b, M) = AES(K_a, AES(K_b, M)) XOR H(K_a || K_b || gate_id). This construction ensures that knowing one input label and one output label reveals nothing about the other input or about other rows of the truth table.

Alice randomly permutes the four rows of each garbled gate. This is the “garbling” – the evaluator cannot determine which row corresponds to which input combination without knowing the correct input labels.

Step 3: Sending the Garbled Circuit

Alice sends Bob:

The entire garbled circuit (all garbled gates with permuted truth tables)
The labels for her input bits (K_{a_i}^{x_i} for each of her input wires a_i, where x_i is her actual input bit)
A mapping from output wire labels to bit values (so Bob can interpret the result)

Alice sends only the labels corresponding to her actual input. She does not send both labels for any input wire, so Bob learns nothing about her input.

Step 4: Oblivious Transfer for Bob’s Input

Bob needs the labels corresponding to his input bits. He cannot simply ask Alice for them, because Alice would learn his input. The solution is Oblivious Transfer (OT).

In 1-out-of-2 OT, Alice holds two values (K_w^0, K_w^1), and Bob holds a choice bit b. After the protocol:

Bob learns K_w^b (the label for his actual input bit)
Alice learns nothing about b

This is achieved through public-key cryptographic techniques. The most efficient construction uses the Simplest OT protocol (Chou and Orlandi, 2015), which requires a single elliptic curve scalar multiplication per bit. For a circuit where Bob has n input bits, n OT instances are required.

OT Extension (Ishai et al., 2003) reduces the cost dramatically: using a base of approximately 128 real OT instances, arbitrary numbers of additional OTs can be computed using only symmetric-key operations (hashing and XOR). This makes the OT cost essentially free for large circuits.

Step 5: Evaluation (Bob’s Work)

Bob evaluates the garbled circuit gate by gate, in topological order. For each gate, he holds one label per input wire (either the 0-label or the 1-label, corresponding to the actual bit value on that wire). He attempts to decrypt each of the four garbled rows. Exactly one row will decrypt correctly (the row corresponding to his input labels). The decrypted value is the output label for that gate, which Bob uses as an input label for subsequent gates.

At the output wires, Bob holds output labels that he maps to bit values using Alice’s output mapping. This gives him f(x, y). He can send the result to Alice (or not, depending on the protocol’s requirements).

Security

Alice’s privacy: Bob never learns Alice’s input because he only receives the labels corresponding to Alice’s actual input bits. He never sees both labels for any of Alice’s input wires, so he cannot infer whether any wire carried 0 or 1 beyond what the output reveals.

Bob’s privacy: Alice never learns Bob’s input because the oblivious transfer protocol reveals nothing about Bob’s choice bits to Alice.

Correctness: The garbled circuit computes exactly the agreed-upon function. Alice cannot “cheat” by garbling a different function – but in the basic protocol, Bob cannot verify this. Extensions using commitment schemes and cut-and-choose techniques (described below) address this limitation.

Optimizations: From Theory to Practice

The basic garbled circuit protocol is well-understood but inefficient. Decades of optimization have reduced the cost by orders of magnitude.

Free XOR (Kolesnikov and Schneider, 2008)

The free XOR technique eliminates garbled tables for XOR gates entirely. Alice chooses a global random offset Delta. For each wire w, she sets K_w^1 = K_w^0 XOR Delta. To evaluate an XOR gate with inputs a and b, Bob simply XORs the input labels:

K_c = K_a XOR K_b

This automatically produces the correct output label because:

If a=0, b=0: K_a^0 XOR K_b^0 = K_c^0 (correct: 0 XOR 0 = 0)
If a=0, b=1: K_a^0 XOR K_b^1 = K_a^0 XOR (K_b^0 XOR Delta) = K_c^0 XOR Delta = K_c^1 (correct)
And so on for all four cases.

XOR gates require zero communication and zero computation beyond a single XOR operation. Since many circuits are XOR-dominated (AES is approximately 78% XOR gates), free XOR reduces total cost dramatically.

Half Gates (Zahur, Rosulek, and Evans, 2015)

Half gates reduce the garbled table size for AND gates from 4 ciphertexts to 2 ciphertexts. The technique decomposes each AND gate into two “half gates” – one where the garbler knows one input’s value and one where the evaluator knows one input’s value. Each half gate requires only one ciphertext.

Combined with free XOR, the half-gates technique means a garbled circuit requires communication of 2 * 128 bits = 256 bits = 32 bytes per AND gate, and zero bytes per XOR gate. For AES-128 (6,800 AND gates), the total garbled circuit size is approximately 212 KB.

Point-and-Permute

Each wire label includes a “permutation bit” (the least significant bit of the label is set to match the wire’s semantic value, XOR’d with a random bit known to the garbler). This allows the evaluator to determine which row of the garbled table to decrypt without trial decryption, reducing evaluation from 4 decryption attempts per gate to 1.

Row Reduction

Row reduction techniques (GRR3, GRR2) reduce the garbled table size by deriving one or more entries from the other labels, eliminating the need to transmit them. The half-gates technique subsumes earlier row reduction approaches.

Cut-and-Choose: Preventing Alice from Cheating

In the basic protocol, Alice could garble a different circuit than agreed upon – computing a function that leaks Bob’s input. The cut-and-choose technique addresses this:

Alice generates s garbled versions of the circuit (typically s = 40 or s = 128 for statistical security).
Bob randomly selects a subset of circuits to “cut” (open and verify) and the remaining circuits to evaluate.
For the cut circuits, Alice reveals all labels, and Bob verifies that the circuit correctly implements f.
For the evaluate circuits, Bob evaluates them and takes the majority output.

If Alice cheats in any circuit, she risks it being selected for checking. With s = 40 circuits and half checked, Alice must cheat in all 20 evaluated circuits to succeed – the probability of not being caught is negligible.

Modern protocols (LEGO, authenticated garbling, etc.) reduce the overhead of cut-and-choose using more sophisticated techniques, bringing the practical cost closer to a single garbled circuit evaluation.

Performance: Where We Stand

A 2024 benchmark study comparing garbled circuit implementations found:

AES-128: 0.03 seconds total (offline + online), 212 KB communication
SHA-256: 0.15 seconds, ~1 MB communication
Private set intersection (1 million elements): ~3 seconds, ~200 MB communication
32-bit integer comparison: < 1 millisecond, ~2 KB communication

These benchmarks use the EMP-toolkit (Efficient Multi-Party computation toolkit) on standard hardware. Performance scales linearly with circuit size – a circuit with 10 million AND gates requires approximately 310 MB of communication and processes in under 10 seconds on a gigabit connection.

For comparison, homomorphic encryption on the same operations is typically 100-10,000x slower, though it has the advantage of non-interactivity. Garbled circuits require a round of interaction (Alice sends the garbled circuit, Bob evaluates and returns the result), but the computation itself is extremely efficient.

Applications

Private set intersection. Two companies want to find shared customers without revealing their full customer lists. Each company’s customer IDs are encoded as circuit inputs, and the circuit outputs only the IDs that appear in both lists. Google’s Private Join and Compute uses this for advertising measurement without sharing user data.

Secure auctions. Bidders submit encrypted bids, and the auction circuit determines the winner and winning price without revealing losing bids. The Danish sugar beet auction (2008) was the first large-scale deployment of secure multi-party computation in an economic context, processing bids from over 1,200 farmers.

Private database queries. A user queries a database without revealing the query, and the database responds without revealing non-matching records. This is related to Private Information Retrieval (PIR) and can be constructed from garbled circuits.

Biometric matching. A fingerprint or face template is compared against a database without revealing either the query template or the stored templates. This prevents biometric databases from becoming honeypots.

Garbled Circuits vs. Other MPC Approaches

Approach	Rounds	Communication	Computation	Best For
Garbled Circuits	1 (constant)	O(circuit size)	Fast	Low-latency 2PC
GMW Protocol	O(depth)	O(circuit size)	Fast	Multi-party
SPDZ / Overdrive	O(depth)	O(circuit size)	Moderate	Malicious security
Homomorphic Encryption	0 (non-interactive)	O(input size)	Very slow	Non-interactive

Garbled circuits excel when latency matters and the computation is between exactly two parties. For more than two parties, secret-sharing-based protocols (GMW, SPDZ) are typically more efficient.

The Stealth Cloud Perspective

Stealth Cloud’s PII engine strips personally identifiable information from AI prompts before they reach any LLM provider. This is a client-side operation today: the user’s browser runs a WebAssembly module that detects and tokenizes PII, replacing real names, emails, and addresses with synthetic tokens.

Garbled circuits offer a theoretically stronger alternative: the PII detection could be computed as a secure two-party function between the client (holding the raw prompt) and a sanitization service (holding the PII detection model), where neither party learns the other’s input. The client would not need to trust the detection model’s integrity, and the service would not see the raw prompt.

The practical challenge is performance. Processing a 500-word prompt through a PII detection circuit with NER (Named Entity Recognition) would require millions of AND gates – feasible but slow compared to the millisecond latency targets of the current WASM implementation. As garbled circuit performance continues improving (roughly halving every 3-4 years over the past decade), the crossover point approaches.

For now, Stealth Cloud’s zero-knowledge architecture relies on client-side PII processing with AES-256-GCM encryption of sanitized prompts. Garbled circuits represent a future where even the sanitization step trusts no single party – where the computation itself is distributed, and the only thing that emerges is the result.

Yao’s millionaires solved the wealth comparison problem in 1986. Four decades later, the same technique is solving the privacy computation problem at internet scale.