AI Data Retention Policies: What Every Provider Keeps and For How Long

A forensic comparison of data retention policies across every major AI provider. What they keep, how long they keep it, what they claim versus what the architecture permits, and what this means for your data.

The most consequential detail in any AI provider’s terms of service is buried in the data retention section – a paragraph or two that determines whether your conversations exist for 30 days, 36 months, or indefinitely. Yet comparing retention policies across providers is an exercise in parsing deliberately ambiguous language, distinguishing between what’s promised and what’s architecturally enforced, and understanding that “deleted” in cloud infrastructure rarely means what it means in ordinary language.

We examined the data retention policies of 12 major AI providers as of early 2026. The findings reveal a market where retention periods range from zero seconds to indefinite, where “deleted” data persists in backups for months, and where the gap between policy language and technical reality is wide enough to drive a regulatory investigation through.

The Retention Landscape

OpenAI

Conversation content: 30 days minimum for abuse monitoring (all tiers). Enterprise customers with custom agreements may negotiate shorter periods. API with zero-data-retention (ZDR) enabled: no content retention beyond the API call duration.

Metadata: Retained for an unspecified “operational” period. This includes timestamps, token counts, model versions, user identifiers, and session data. OpenAI’s privacy policy does not specify a maximum retention period for metadata.

Abuse monitoring data: Conversations flagged by automated systems are retained for review by human safety teams. The retention period for flagged content is not publicly specified and may exceed the standard 30-day window.

Training data: For tiers where training use is permitted (Free, Plus with opt-in), data that enters the training pipeline is retained indefinitely as part of the model’s training corpus. There is no mechanism to remove specific data points from a trained model.

Backup systems: OpenAI has not publicly disclosed backup retention policies. Standard cloud infrastructure practices suggest that data replicated across availability zones and backup systems persists beyond the stated primary retention period. When OpenAI says “deleted after 30 days,” this likely refers to the primary data store. Backup copies may persist for additional weeks or months.

The critical nuance: OpenAI’s data practices distinguish between content retention (what they promise to delete) and derivative retention (what they extract from your data before deletion). Aggregated analytics, model improvement signals, and safety classifications derived from your conversations may be retained indefinitely even after the source conversation is deleted.

Anthropic

Conversation content (API): 30 days for safety monitoring by default. Enterprise customers can negotiate zero-retention agreements. In February 2025, Anthropic introduced a “zero-retention” API tier that processes requests without storing input or output data.

Conversation content (claude.ai): Retained for the duration of the user’s account plus 30 days after account deletion. Users can delete individual conversations, but Anthropic’s privacy policy notes that “residual copies may remain in backup systems for a limited period.”

Metadata: Retained for operational purposes. Anthropic’s privacy policy specifies “reasonable periods” without defining a specific maximum.

Safety evaluations: Content reviewed by Anthropic’s safety team may be retained for research purposes. The privacy policy permits retention of data “to conduct research to improve our safety techniques.”

Anthropic’s retention practices are generally more conservative than OpenAI’s, but the ambiguity around backup retention and safety research use creates uncertainty about the true data lifecycle.

Google (Gemini)

Conversation content (Gemini Apps): Retained for up to 36 months when Gemini Apps Activity is enabled (the default). When disabled, new conversations are not saved to the user’s Google Activity, but Google retains conversations for up to 72 hours for safety and “to improve the product.”

Conversation content (Vertex AI / API): Configurable retention with a 30-day default for logging. Customers can disable logging entirely, in which case Google processes data without retention.

Metadata: Integrated into Google’s broader data infrastructure. Usage metadata from Gemini may inform Google’s advertising and personalization systems, though Google states that Gemini conversations are not used for ad targeting. The metadata retention period inherits Google’s general data retention framework, which extends to 18 months for most categories.

Cross-service data sharing: Google’s privacy policy permits data sharing across Google services for “product improvement.” The practical implication is that your Gemini interaction metadata may be combined with data from Gmail, Search, Maps, and other Google services to enrich your Google user profile.

Google’s 36-month retention for Gemini conversations represents the longest standard retention period among major AI providers. For users with Gemini Apps Activity enabled (the default configuration), Google maintains a three-year archive of AI interactions linked to their Google identity.

Microsoft (Copilot / Azure OpenAI)

Copilot Consumer: Conversations are retained and linked to the user’s Microsoft account. Microsoft’s privacy policy does not specify a maximum retention period for Copilot conversations.

Azure OpenAI Service: Configurable. With abuse monitoring enabled (the default), prompts and responses are retained for 30 days. Customers can apply for an exemption to disable abuse monitoring, which reduces retention to zero for prompt content. Approved exemptions are available for customers who complete a compliance review.

Microsoft 365 Copilot: Data processed through M365 Copilot is subject to the organization’s Microsoft 365 data governance policies. Retention inherits the organization’s configured retention labels, which means M365 Copilot conversations may be retained for years under compliance retention policies.

Telemetry: Microsoft collects diagnostic and usage telemetry from Copilot interactions. The retention period for this telemetry is specified as “up to 18 months” in Microsoft’s data protection documentation.

Meta (Llama via Meta AI)

Meta AI (consumer product): Conversations processed through Meta AI on WhatsApp, Instagram, and Facebook are subject to Meta’s general data retention policy, which permits indefinite retention. Meta’s privacy policy states that it retains data “as long as necessary” for its operational purposes.

Llama (self-hosted): When organizations host Llama models on their own infrastructure, Meta has no data access or retention capability. This represents the cleanest retention profile among major AI options, but requires the organization to manage its own infrastructure.

The contrast is stark: Meta’s consumer AI product has the most permissive retention policy in the industry (effectively indefinite), while its open-source model enables the most restrictive (zero, when self-hosted).

Smaller Providers

Together AI: API-only service with a stated 30-day retention for abuse monitoring. Offers a zero-retention option for enterprise customers.

Groq: Claims zero retention for API inference. Groq’s infrastructure processes requests on custom LPU hardware and states that prompt data is not stored after the inference is complete.

Perplexity: Search queries and conversations are retained to improve search quality. The retention period is not publicly specified. Perplexity’s privacy policy permits use of interaction data for “improving and developing our services.”

Mistral AI: API data is retained for 30 days for abuse monitoring under EU data protection frameworks. Mistral’s Paris-based operations subject it to GDPR’s data minimization requirements, which legally constrain retention beyond operational necessity.

The Backup Problem

Every retention comparison must contend with the gap between stated policy and technical reality. Cloud infrastructure operates through redundancy: data is replicated across availability zones, backed up to cold storage, and captured in database snapshots that may persist long after the “primary” copy is deleted.

When a provider says “deleted after 30 days,” they typically mean that the data is removed from the primary data store – the active database that serves the application. But cloud databases maintain transaction logs, point-in-time recovery snapshots, and cross-region replicas that may retain the data for additional weeks or months.

Standard practices at major cloud providers (AWS, Azure, GCP) include:

Database transaction logs: Retained for 7-35 days to support point-in-time recovery
Automated backups: Retained for 1-35 days depending on configuration
Cross-region replicas: May lag behind primary deletion by hours or days
Cold storage archives: Some organizations archive database snapshots to cold storage for disaster recovery, with retention periods of months or years

No major AI provider has published a comprehensive accounting of how their retention policies apply to backup and redundancy systems. The 30-day retention period that OpenAI, Anthropic, and others advertise almost certainly understates the true data lifecycle when backup systems are included.

Retention vs. Deletion: What “Deleted” Means

When you delete a conversation in ChatGPT, Claude, or Gemini, the provider marks the data for deletion in the primary data store. This is a logical deletion, not a physical one. The data’s storage sectors are not overwritten; they are freed for future use. Until those sectors are overwritten by new data, the “deleted” information remains technically recoverable.

The distinction matters for two scenarios:

Legal discovery: Courts can compel production of data that is technically recoverable, even if the provider considers it “deleted.” A subpoena served within the backup retention window – potentially weeks or months after the user deleted the conversation – could produce data the user believed was gone.

Security breaches: An attacker who gains access to a provider’s storage infrastructure can recover logically deleted data from unoverwritten sectors. The time window for this recovery depends on storage utilization rates and overwrite patterns, but for providers with large storage deployments and low utilization rates, the window can extend for months.

Cryptographic deletion – destroying the encryption key rather than the data – is the only deletion method that provides immediate, irreversible data destruction. Under cryptographic deletion, the ciphertext may persist in storage and backups indefinitely, but without the encryption key, it is computationally unrecoverable. Among major AI providers, only enterprise-tier offerings with customer-managed encryption keys support this deletion method.

The Metadata Retention Gap

Every provider’s retention policy focuses on conversation content. Metadata retention is uniformly less defined and typically longer.

The metadata generated by a single AI interaction includes:

Timestamp (when you interacted)
Duration (how long the session lasted)
Token count (how much you wrote and received)
Model version (which model processed your request)
User identifier (your account or API key identifier)
IP address (your network location)
Device fingerprint (browser, OS, screen resolution)
Session identifier (linking multiple interactions)
Error codes (whether the interaction failed and why)
Feature flags (which features were active for your request)

Individually, these data points seem innocuous. Collectively, they constitute a behavioral profile of extraordinary specificity. Temporal patterns reveal your work schedule, time zone, and productivity cycles. Token counts reveal whether you’re asking quick questions or engaging in deep analysis. Model selection reveals your willingness to pay and your quality expectations. Session patterns reveal which topics require extended exploration.

No major AI provider specifies a maximum retention period for this metadata. The practical assumption should be that metadata is retained for at least as long as it has operational value – which, for analytics and business intelligence purposes, means indefinitely.

Regulatory Implications

Data retention policies intersect with regulatory requirements in ways that create compliance complexity.

GDPR’s storage limitation principle (Article 5(1)(e)) requires that personal data be kept “for no longer than is necessary for the purposes for which the personal data are processed.” AI providers cite abuse monitoring and safety as the purposes justifying their retention periods, but whether a 30-day (or 36-month) retention period is “necessary” for these purposes has not been definitively tested in European courts.

The right to erasure (Article 17) creates a direct tension with training data retention. If your data has been used to train a model, the opt-out myth applies: there is no practical mechanism to remove your specific data from trained model weights. Providers address this by arguing that model weights are not “personal data” within GDPR’s definition – an argument that European data protection authorities have not yet fully accepted or rejected.

U.S. state privacy laws (CCPA, CPRA, Virginia CDPA, Colorado Privacy Act) provide varying rights to deletion. Unlike GDPR, these laws generally permit retention for “reasonable business purposes,” giving providers significant discretion.

The country-by-country privacy analysis details how retention obligations vary across jurisdictions, but the practical takeaway is consistent: regulatory frameworks constrain retention in theory but have not yet imposed meaningful enforcement in practice.

Comparing the Numbers

A condensed retention comparison across major providers:

Provider	Content (Consumer)	Content (API/Enterprise)	Metadata	Training Data
OpenAI	30 days min	0-30 days (configurable)	Unspecified	Indefinite
Anthropic	Account duration + 30 days	0-30 days	“Reasonable period”	Not used (API)
Google	Up to 36 months	0-30 days (configurable)	Up to 18 months	Varies by tier
Microsoft	Unspecified	0-30 days (configurable)	Up to 18 months	Not used (Azure)
Meta	Indefinite	N/A (self-hosted)	Indefinite	Varies
Mistral	30 days	30 days	Unspecified	GDPR-constrained
Groq	N/A	0 (claimed)	Unspecified	Not used

The table reveals that consumer tiers universally retain data longer than enterprise tiers, metadata retention is uniformly less defined than content retention, and “zero retention” claims are available only from smaller providers or premium enterprise tiers.

What Retention Means for Your Risk

The practical risk of data retention depends on two factors: how long the data exists and who might access it during that window.

30-day retention creates exposure to: provider security breaches, employee access for safety review, regulatory subpoenas, and standard legal discovery.

36-month retention (Google Gemini default) creates additional exposure to: longitudinal behavioral profiling, cross-service data enrichment, and extended legal discovery windows that cover multi-year litigation timelines.

Indefinite retention (Meta AI, some consumer tiers) creates perpetual exposure to all of the above, with the added risk that data practices and access policies may change over time – the company that retained your data in 2024 may have different ownership, different policies, and different legal obligations in 2028.

The Stealth Cloud Perspective

The retention comparison above reveals an industry where “how long we keep your data” is the wrong question. The right question is “why does anyone need to keep it at all?”

Every provider justifies retention as necessary for abuse monitoring, safety evaluation, or product improvement. But these justifications assume an architecture where the provider has access to cleartext conversation content – an assumption that zero-knowledge architectures challenge directly.

Stealth Cloud implements zero-persistence by design: conversation data exists only in RAM during active sessions, encrypted with client-held keys, and is cryptographically destroyed when the session ends. There is no 30-day safety retention window because the provider never holds the keys needed to read the data. There is no backup retention gap because encrypted data without keys is computationally worthless. There is no metadata profiling because sessions are authenticated through wallet signatures rather than identity-linked accounts.

The retention debate is ultimately a debate about architecture. In a system designed for surveillance, retention policies are the negotiated limits of that surveillance. In a system designed for privacy, retention is architecturally impossible – and no policy change, no acquisition, and no court order can retroactively create data that was never stored.