AI Compute Geography: Where AI Training Actually Happens and Why It Matters

A geographic analysis of global AI compute infrastructure, mapping where large-scale AI training occurs, who controls the data centers, how energy and regulatory constraints shape compute location decisions, and the privacy implications of concentrating AI processing in a small number of jurisdictions and operators.

In 2025, approximately 78% of all large-scale AI model training – defined as training runs consuming more than 10,000 GPU-hours – occurred in data centers located in four US states: Virginia, Oregon, Texas, and Iowa. This geographic concentration is one of the most consequential and least discussed facts about the AI industry. It means that regardless of where an AI model is deployed, regardless of the nationality of its users, and regardless of the legal jurisdiction under which it is marketed, the model’s behavior was shaped by computation that occurred predominantly under US law, on infrastructure owned by a handful of US corporations, powered by energy grids subject to US regulatory authority.

For privacy, the implications are direct. When a European user interacts with an AI chatbot, their query may be processed on a server in Frankfurt or Dublin. But the model itself – the weights that determine what the system knows, what it remembers, and how it responds – was trained in a US data center, on training data assembled under US data handling practices, using infrastructure operated by companies subject to US government surveillance authority (FISA Section 702, CLOUD Act, Executive Order 12333). The privacy of the inference step is a separate question from the privacy of the training process, and the geography of training determines who had access to the data during the most sensitive phase of model development.

This report maps the global geography of AI compute, examines the forces concentrating it, and analyzes the privacy and sovereignty implications of a world where AI capability is forged in a small number of locations controlled by a small number of entities.

The Current Map: Where Compute Lives

United States: The Dominant Node

The US hosts an estimated 60% of global data center capacity by power consumption and approximately 78% of large-scale AI training compute. The concentration is driven by three factors: the early build-out of hyperscale cloud infrastructure (AWS launched in 2006, followed by Azure and Google Cloud), the physical proximity to hardware suppliers (Nvidia’s headquarters are in Santa Clara; AMD is in Santa Clara; Intel is in Santa Clara), and energy economics that have favored certain US regions.

Northern Virginia (Loudoun County) is the densest data center market in the world. The cluster, centered around Ashburn, hosts an estimated 300+ data centers consuming over 3.5 GW of power. The concentration arose from the historical location of early internet exchange points, favorable Virginia tax policy, and the proximity to the US government (a major data center customer). AWS, Microsoft, Google, Meta, and Oracle all operate major facilities in the region. For AI training specifically, Northern Virginia’s role is primarily inference and fine-tuning rather than large-scale pre-training, due to power density constraints.

Oregon (The Dalles, Prineville, Boardman) hosts significant Google and Meta training infrastructure, attracted by abundant hydroelectric power from the Columbia River system. Google’s The Dalles facility has been expanded repeatedly and is one of the primary sites for Gemini model training. The Pacific Northwest’s combination of low electricity costs ($0.03-0.05/kWh wholesale) and cool climate (reducing cooling costs) makes it optimal for sustained, power-intensive training runs.

Texas (Dallas-Fort Worth, San Antonio, Midland-Odessa) has emerged as the fastest-growing AI data center market, driven by deregulated energy markets, available land, and proximity to natural gas generation. Texas added an estimated 1.2 GW of data center capacity in 2025 alone. The ERCOT grid’s deregulated structure allows data centers to negotiate power purchase agreements directly with generators, bypassing the rate structures that constrain expansion in regulated markets.

Iowa (Des Moines, Council Bluffs) hosts Meta’s largest AI training cluster and significant Microsoft and Google presence. The attraction is similar to Oregon: low electricity costs, cool climate, and available land. Microsoft’s Iowa facilities are primary training sites for its partnership with OpenAI.

Europe: The Sovereignty Push

European AI compute capacity has expanded rapidly since 2023, driven by data sovereignty requirements and the EU’s strategic objective to reduce dependence on US hyperscalers. The sovereign cloud movement has catalyzed investment in European-owned and operated AI infrastructure.

Nordics (Sweden, Finland, Norway) have attracted disproportionate investment due to abundant renewable energy (hydroelectric and wind), cool climates, and stable political environments. Microsoft’s planned $3.2 billion investment in Sweden (announced 2024) and Google’s $1 billion expansion in Finland specifically target AI training workloads. Nordic data centers operate at Power Usage Effectiveness (PUE) ratios as low as 1.05, compared to industry averages of 1.3-1.5, making them among the most energy-efficient training locations globally.

Germany (Frankfurt, Berlin-Brandenburg) is the largest European data center market by capacity, though its role in AI training is constrained by energy costs (EUR 0.15-0.25/kWh for industrial customers, 3-5x US rates) and the political sensitivity of nuclear energy policy. German cloud providers – SAP, Deutsche Telekom’s T-Systems, and IONOS – have launched sovereign AI cloud offerings aimed at European customers who require data processing within EU jurisdiction.

France has positioned itself aggressively for AI sovereignty. OVHcloud, the largest European-owned cloud provider, has expanded GPU capacity for AI training. The French government has subsidized the Scaleway AI cluster (Illiad Group) and supported national AI training initiatives through France 2030 investment programs.

The European challenge is scale. The combined AI training compute capacity of all European cloud providers is estimated at less than 15% of US hyperscaler capacity. European sovereign AI initiatives are meaningful for fine-tuning, inference, and sensitive-data workloads, but the pre-training of frontier AI models remains economically impractical at European energy prices and infrastructure scale.

China: The Parallel System

China operates what is effectively a parallel AI compute ecosystem, driven by US export controls on advanced semiconductors (implemented October 2022 and expanded in 2023-2024). The restrictions prohibit the export of Nvidia’s A100, H100, and subsequent AI-optimized GPUs to China, forcing Chinese AI companies to rely on domestic alternatives (Huawei Ascend 910B, Biren BR100) and pre-restriction stockpiles.

Chinese AI training occurs primarily in data centers operated by Baidu, Alibaba, Tencent, and Huawei, concentrated in Beijing, Shanghai, Shenzhen, and Guiyang (Guizhou province, where low energy costs have attracted hyperscale facilities). Chinese compute capacity for AI is estimated at 20-25% of global capacity, though direct comparison is complicated by the use of different GPU architectures and the opacity of Chinese infrastructure reporting.

The privacy implications of the Chinese AI compute ecosystem are distinct from those of the US ecosystem. Chinese AI companies operate under the Personal Information Protection Law (PIPL) and the Cybersecurity Law, which impose data localization requirements and grant the Chinese government broad data access authority. For Chinese users, AI training data remains within Chinese jurisdiction but subject to state access that makes Western data protection frameworks look permissive by comparison.

Middle East and Southeast Asia: The New Entrants

The UAE (Abu Dhabi, Dubai) and Saudi Arabia (NEOM, Riyadh) have made substantial investments in AI infrastructure, driven by sovereign ambition and sovereign wealth. G42 (UAE) operates the Condor Galaxy AI supercomputer cluster in partnership with Cerebras Systems. Saudi Arabia’s SDAIA (Saudi Data and Artificial Intelligence Authority) has funded multiple AI infrastructure projects, including partnerships with Nvidia and AMD.

Singapore has emerged as the primary AI compute hub for Southeast Asia, with government-supported initiatives (National AI Strategy 2.0) and investments from AWS, Google, and Microsoft in local GPU clusters. Singapore’s data protection framework (PDPA), strategic geographic position, and undersea cable connectivity make it a natural inference and fine-tuning hub for the APAC region.

The Energy Constraint

AI training is extraordinarily energy-intensive. Training GPT-4 consumed an estimated 50 GWh of electricity – equivalent to the annual consumption of approximately 4,600 US households. Training runs for frontier models in 2025-2026 are consuming 100-300 GWh each, and the trajectory is exponential: each generation of frontier model consumes 3-5x the energy of its predecessor.

Global data center electricity consumption reached an estimated 460 TWh in 2025, approximately 1.7% of global electricity generation. AI training and inference are projected to push this to 800-1,000 TWh by 2028, representing 3-4% of global electricity production. The International Energy Agency (IEA) projects that data center electricity demand will exceed the total electricity consumption of Japan by 2030.

The energy constraint shapes geography. AI training gravitates toward locations with abundant, low-cost, and reliable electricity. This has historically favored US regions with access to natural gas generation (Texas, Virginia) or hydroelectric power (Oregon, Iowa). It increasingly favors Nordic countries (hydroelectric and wind) and Middle Eastern locations (natural gas and solar).

The privacy implication is that energy economics, not privacy considerations, determine where AI training occurs. A company that wants to train a model under Swiss or German jurisdiction must accept electricity costs 3-5x higher than training in Oregon or Texas. This cost differential is a structural barrier to data sovereignty in AI: the jurisdictions with the strongest privacy protections are often not the jurisdictions with the cheapest compute.

Concentration Risk and the Hyperscaler Oligopoly

Three companies – AWS, Microsoft Azure, and Google Cloud – control an estimated 67% of global cloud infrastructure revenue and an even higher percentage of AI training compute capacity. Adding Meta (which operates massive AI training infrastructure for internal use) and Oracle (which has expanded aggressively into AI cloud), five companies control approximately 80% of global AI training capacity.

This concentration creates several privacy-relevant dynamics.

Legal jurisdiction exposure. All five companies are US-domiciled and subject to FISA Section 702, which authorizes the US government to compel access to data stored by US companies, including data stored on servers physically located outside the United States. The CLOUD Act extends this authority explicitly to data stored abroad. A European organization training a model on Azure in Frankfurt is subject to both EU data protection law (GDPR) and US surveillance law (CLOUD Act), creating a legal conflict that no contractual arrangement can fully resolve.

Infrastructure dependency. Organizations that rely on hyperscaler infrastructure for AI training are dependent on those providers’ technical, commercial, and political decisions. A provider’s decision to discontinue a service, change pricing, or comply with a government data request affects every customer on the platform. The cloud provider lock-in analysis documents the switching costs that make migration difficult once infrastructure dependency is established.

Opacity of training environments. When an organization uses hyperscaler GPU clusters for AI training, it has limited visibility into the physical security, network architecture, and data handling practices of the training environment. The customer controls the software; the provider controls the hardware, the firmware, the hypervisor, and the physical facility. Confidential computing technologies (Intel SGX, AMD SEV, Nvidia H100 Confidential Computing) partially address this gap by encrypting data in use, but adoption for AI training workloads remains limited.

The Sovereign Compute Movement

The concentration of AI compute in US hyperscalers and US jurisdictions has prompted a global sovereign compute movement – government-backed initiatives to ensure national or regional control over AI training infrastructure.

The European Union’s IPCEI-CIS (Important Project of Common European Interest on Cloud Infrastructure and Services) has allocated EUR 1.2 billion in public funding (matched by private investment) to build sovereign cloud and AI infrastructure across EU member states. The initiative aims to ensure that European organizations can train and deploy AI models on European-owned infrastructure, under European jurisdiction, without reliance on non-EU hyperscalers.

France’s Jean Zay supercomputer (expanded in 2024 with 1,456 Nvidia A100 GPUs) and Germany’s Juelich Supercomputing Centre (hosting the JUPITER exascale system with GPU partitions for AI training) represent government-funded AI compute capacity available to academic and commercial users under national jurisdiction.

The UK’s AI Compute Investment (announced 2024, GBP 3.5 billion) includes expansion of the Bristol-based Isambard AI facility and partnerships with Nvidia for sovereign AI training capacity.

Japan, South Korea, and India have each announced national AI compute initiatives ranging from $1-10 billion in investment, driven by the recognition that AI capability is a function of compute access, and compute access is a function of infrastructure ownership.

The sovereign compute movement has privacy implications that are not uniformly positive. Sovereign infrastructure ensures that training data remains within national jurisdiction, but it also ensures that national governments have jurisdictional authority over that data. For countries with strong privacy protections (Switzerland, the Nordic countries), sovereign compute strengthens privacy. For countries with expansive surveillance authority (China, Russia, some Middle Eastern states), sovereign compute strengthens state access to training data and model capabilities.

Implications for AI Privacy

The geography of AI compute creates three structural privacy problems that current regulatory frameworks do not adequately address.

Problem 1: Training jurisdiction disconnect. A model trained on data collected in the EU, processed in a US data center, and deployed for inference in Singapore spans three legal jurisdictions with incompatible privacy requirements. Current regulations focus on the location of data storage and the location of the data subject, but they poorly address the transient processing that occurs during training. The data sovereignty map documents the jurisdictional complexity but does not resolve it.

Problem 2: Training data opacity. The geographic concentration of training in a small number of facilities, operated by a small number of companies, means that the training process – which determines what a model knows, including what personal data it has memorized – occurs in environments with limited external visibility. The AI training tax that organizations pay when their data enters a training pipeline is assessed in facilities they cannot inspect, under conditions they cannot verify.

Problem 3: Inference ≠ training privacy. Organizations and users increasingly understand inference-time privacy (encrypting queries, anonymizing metadata). But training-time privacy – ensuring that the model itself does not contain personal data extracted during training – is a fundamentally different problem that requires visibility into the training process, not just the inference process.

The Stealth Cloud Perspective

The geographic concentration of AI compute is a privacy problem that technical architecture can partially but not fully address. Stealth Cloud’s zero-knowledge architecture protects users at inference time: queries are stripped of PII client-side, encrypted in transit, processed ephemerally, and never retained. This addresses Problem 3 at the inference layer – the interaction between user and model is private regardless of where the model was trained.

But the training-time privacy problem requires a different approach. The models that Stealth Cloud routes queries to were trained in facilities we do not control, on data we did not curate, under jurisdictional authority we are not subject to. Our architectural response is to minimize the information that reaches the model in the first place. The PII stripping engine ensures that personal data never enters the model’s context window, preventing both real-time data exposure and the possibility that personal data could be captured in model logs or evaluation datasets at the provider’s facility.

Stealth Cloud’s Swiss domicile and Cloudflare Workers edge infrastructure ensure that the proxy layer – the point where user queries are sanitized and routed – operates under Swiss jurisdiction and at the network edge nearest the user. The model may be in Virginia. The user may be in Tokyo. But the privacy-critical processing – PII detection, sanitization, encryption, and metadata stripping – occurs in a jurisdiction and on infrastructure that we control.

The long-term solution to AI compute concentration is not geographic redistribution (which energy economics resist) but architectural decoupling: ensuring that the privacy of AI interaction does not depend on the jurisdiction of AI training. That decoupling – separating inference privacy from training geography – is a foundational design principle of Stealth Cloud’s architecture, and it is the only approach that scales across a world where compute follows energy and energy does not follow privacy law.