Every organization using AI is placing a bet. The bet is that the AI provider handling their data – the prompts their employees write, the documents they upload, the proprietary information they feed into context windows – will treat that data with the care the organization expects. Most organizations lose this bet, not because the provider acts maliciously, but because the organization never defined the terms.
An AI privacy audit is the process of systematically evaluating what an AI provider does with your data, what they are contractually obligated to do, what they are technically capable of doing, and where the gaps lie between their promises and their architecture. It replaces trust with verification. It replaces marketing language with contractual analysis. It replaces assumptions with documented evidence.
This guide provides a structured methodology for conducting that audit. It is designed for engineering leaders, security teams, privacy officers, and any professional who is responsible for the decision of which AI provider to use – and who will bear the consequences if that decision is wrong.
The Audit Framework
The audit evaluates an AI provider across seven domains. Each domain maps to a category of risk that your organization faces when sending data to an external AI system.
- Data Training Practices – Is your data used to improve their models?
- Data Retention – How long is your data stored, and in what form?
- Access Controls – Who at the provider can access your data?
- Contractual Protections – What does the DPA actually guarantee?
- Subprocessor Exposure – Where does your data flow beyond the primary provider?
- Security Architecture – How is your data protected in transit and at rest?
- Incident Response – What happens when things go wrong?
For each domain, this guide provides specific questions to ask, documents to request, and red flags to watch for.
Domain 1: Data Training Practices
This is the domain that receives the most public attention and the least rigorous analysis. The question is simple: does the provider use your data to train or improve their models?
Questions to Ask
1. Are API inputs and outputs used for model training? Most major providers (OpenAI, Anthropic, Google) state that API data is not used for training. But “not used for training” is a phrase that admits multiple interpretations. Does it mean the data is never read by humans? Never used for fine-tuning? Never included in evaluation sets? Never used to inform capability decisions?
Request the specific contractual language. The answer should be in the Data Processing Agreement (DPA), not the marketing FAQ.
2. Are consumer-tier inputs used for training? For providers that offer both consumer and API tiers (ChatGPT vs OpenAI API, Claude.ai vs Anthropic API), the privacy postures are typically different. Consumer-tier data is often used for training by default, with an opt-out mechanism. That opt-out mechanism has well-documented limitations.
If any of your employees use the consumer tier (ChatGPT, Claude.ai, Gemini) for work, their inputs may be entering training pipelines regardless of your API-level DPA. This is the shadow AI problem: your contractual protections apply only to the access channels you control.
3. Does the provider use data for “safety” or “abuse” purposes distinct from training? Many providers carve out exceptions for trust and safety review. Human reviewers may inspect flagged conversations. Prompts may be scanned for policy violations. These processes involve human access to your data and are typically excluded from training opt-outs.
Request specifics: what triggers human review? How is the review data handled? Is it retained after review? Is it aggregated or anonymized? Are reviewers employees or contractors? In what jurisdictions do they operate?
4. Does the provider use aggregate or anonymized data derived from your inputs? A provider might truthfully say they don’t use your data for training while still using statistical properties derived from your data – prompt length distributions, topic categorizations, error patterns – to inform model development. This is a gray area that most DPAs do not address.
Red Flags
- The provider says “we don’t use your data for training” but the DPA says “we may use data for model improvement.”
- Training opt-out is tied to functionality degradation (you lose features by opting out).
- The provider cannot answer the distinction between training data, evaluation data, and safety review data.
- No specific contractual language – only marketing statements and blog posts.
Scoring Criteria
| Score | Criteria |
|---|---|
| 5 | Contractual guarantee of no training, no evaluation, no human review of API data. Consumer tier excluded. Verified by independent audit. |
| 4 | Contractual guarantee of no training for API data. Human review only for safety flags. Clear documentation. |
| 3 | No training for API data per ToS (not DPA). Consumer tier trains by default with opt-out. |
| 2 | Ambiguous language. Exceptions for “improvement” or “safety.” Opt-out mechanism with documented deficiencies. |
| 1 | Data used for training by default. No meaningful opt-out. No contractual protections. |
Domain 2: Data Retention
Retention is the shadow companion of training. Even if a provider doesn’t train on your data, retaining it creates exposure. Retained data can be breached, subpoenaed, accessed by insiders, or retroactively included in future training pipelines if policies change.
Questions to Ask
1. What is the retention period for API inputs and outputs? Most providers retain API data for 30 days for abuse monitoring. Some offer zero-retention tiers (OpenAI’s zero data retention option for eligible API customers, for example). The specific retention period should be documented in the DPA.
2. What is retained? Is it the full prompt and response? Metadata only (timestamps, token counts, model version)? Logs of API calls without content? The retention policy should specify the data elements retained, not just the duration.
3. What happens at the end of the retention period? “Deleted” can mean many things. Is the data purged from primary storage, backup storage, and cold archives? Or is it removed from primary storage while persisting in backups for an additional period? Is the deletion verified? Is it logged?
Cryptographic shredding – destroying the encryption key rather than the data itself – is the most reliable deletion mechanism because it doesn’t require locating every copy. Ask whether the provider uses this approach.
4. Can you request early deletion? Under GDPR (Article 17), you have the right to request erasure. Can the provider comply with a per-conversation deletion request? Within what timeframe? Does deletion extend to backups?
5. What is the retention policy for metadata? Even after content is deleted, metadata (timestamps, token counts, model used, IP address, user ID) may persist indefinitely. Metadata retention is often governed by different policies than content retention. Request the specific metadata retention schedule.
Red Flags
- Retention period is vague (“a reasonable period”) or undefined.
- Deletion applies to primary storage only, not backups.
- No mechanism for early deletion or per-conversation deletion.
- Metadata retained indefinitely with no separate policy.
- Retention period has increased over time without customer notification.
Scoring Criteria
| Score | Criteria |
|---|---|
| 5 | Zero retention available. Cryptographic shredding. No backups of customer data. Metadata retention under 30 days. |
| 4 | 30-day retention with verified deletion. Backups purged within 90 days. Metadata retention documented. |
| 3 | 30-day retention. Backup deletion timeline unclear. Metadata retained separately. |
| 2 | Retention period exceeds 30 days. No early deletion mechanism. Backup policy undefined. |
| 1 | Indefinite retention. No deletion mechanism. No distinction between content and metadata. |
Domain 3: Access Controls
Who at the provider can access your data, under what circumstances, and with what oversight?
Questions to Ask
1. Which employee roles have access to customer data? Is access limited to specific teams (trust and safety, engineering on-call) or broadly available to any employee with internal credentials?
2. Is access logged and audited? Every access to customer data should produce an audit log entry identifying who accessed what, when, and why. These logs should be reviewed regularly and available to customers upon request.
3. Is access governed by just-in-time provisioning? Best practice is that no employee has standing access to customer data. Access is requested, approved, time-limited, and automatically revoked. This is the principle of least privilege applied to provider operations.
4. Are there insider threat controls? Background checks for employees with data access? Separation of duties? Technical controls preventing bulk data exfiltration?
5. Do contractors or third-party annotators have access? Many AI providers use contract labor for data annotation, safety review, and quality assessment. These contractors may operate in jurisdictions with weaker data protection laws. Their access should be documented and governed by the same controls as employee access.
Red Flags
- Provider cannot specify which roles have access.
- No audit logging of data access.
- Contractors have access without equivalent security controls.
- Access controls are policy-based (rules) rather than technically enforced (architecture).
- Provider has experienced an insider incident and has not disclosed remediation.
Scoring Criteria
| Score | Criteria |
|---|---|
| 5 | No human access to customer data by design (zero-knowledge architecture). Access is technically impossible, not merely prohibited. |
| 4 | Just-in-time access with approval workflow. Full audit logging. Background checks. No contractor access to production data. |
| 3 | Role-based access controls. Audit logging exists but is not customer-visible. Contractors may access flagged content. |
| 2 | Broad access for engineering and operations. Limited audit logging. Contractor access poorly documented. |
| 1 | No formal access controls. No audit logging. Widespread employee access to customer data. |
Domain 4: Contractual Protections
Marketing statements, blog posts, and FAQ pages are not enforceable. Only the contract – specifically the Data Processing Agreement (DPA), Terms of Service (ToS), and any negotiated amendments – creates binding obligations.
Documents to Request and Review
1. Data Processing Agreement (DPA). This is the core document governing how the provider processes your data. Under GDPR, a DPA is legally required for any data processor. Even if you’re outside the EU, a DPA provides the strongest contractual framework.
Key DPA elements to verify:
- Purpose limitation. The DPA should state that data is processed only for the purpose of providing the service. Any other processing (training, analytics, improvement) must be explicitly excluded or separately consented.
- Sub-processor obligations. The DPA should require the provider to impose equivalent data protection obligations on any sub-processor.
- Breach notification. Timeline for notifying you of a breach (GDPR requires 72 hours). Definition of what constitutes a reportable breach.
- Data location. Where is your data processed and stored? Which jurisdictions? Can you restrict processing to specific regions?
- Audit rights. Can you (or a third-party auditor) inspect the provider’s compliance with the DPA? At what frequency? At whose cost?
2. Terms of Service (ToS). The ToS often contains data usage clauses that the DPA doesn’t override. Check for clauses that grant the provider rights to use data for “improving the service,” “developing new features,” or “aggregate analytics.” These clauses can be interpreted broadly.
3. Privacy Policy. While often written for consumers, the privacy policy may contain disclosures about data practices that differ from or supplement the DPA.
4. Security whitepaper or SOC 2 report. Request the provider’s SOC 2 Type II report, ISO 27001 certificate, or equivalent. These documents verify that the provider’s security controls have been independently audited.
Red Flags
- No DPA available, or DPA is “coming soon.”
- DPA language is weaker than marketing claims.
- ToS grants broad data usage rights that the DPA doesn’t restrict.
- No audit rights or audit rights require provider approval.
- Breach notification timeline exceeds 72 hours or is undefined.
- Provider has changed DPA terms retroactively without customer notification.
Scoring Criteria
| Score | Criteria |
|---|---|
| 5 | Comprehensive DPA with explicit training exclusion, audit rights, data location restrictions, 72-hour breach notification, and sub-processor controls. Negotiable terms for enterprise. |
| 4 | Standard DPA meeting GDPR requirements. Training exclusion for API. Audit rights with reasonable conditions. SOC 2 Type II available. |
| 3 | DPA available but limited. Some ambiguous clauses. SOC 2 Type I only. Audit rights conditional. |
| 2 | DPA available but ToS contains conflicting clauses. No audit rights. Limited security documentation. |
| 1 | No DPA. Terms of service govern all data practices. No security certifications available. |
Domain 5: Subprocessor Exposure
Your data relationship with an AI provider is rarely bilateral. The provider uses subprocessors – cloud hosting, content delivery networks, monitoring services, annotation platforms – that may also handle your data. Each subprocessor expands your attack surface and your jurisdictional exposure.
Questions to Ask
1. Who are the subprocessors? Request the complete subprocessor list. Major providers publish these (OpenAI, Anthropic, and Google all maintain public subprocessor lists). Review the list for providers you wouldn’t independently trust with your data.
2. What data does each subprocessor access? Not all subprocessors see the same data. Your cloud hosting provider (typically AWS, GCP, or Azure) sees encrypted data at rest. A monitoring service might see metadata. An annotation service might see conversation content. Map the data elements to each subprocessor.
3. In what jurisdictions do subprocessors operate? If your data is processed by a subprocessor in a jurisdiction without adequate data protection (per EU adequacy decisions, for example), this creates legal risk. Subprocessors in the US are subject to Section 702 FISA surveillance authorities. Subprocessors in China are subject to the Data Security Law.
4. What notice is provided when subprocessors change? The provider should notify you before adding a new subprocessor that will handle your data, with sufficient lead time to object or terminate.
5. Can you restrict subprocessor usage? Some enterprise agreements allow customers to object to specific subprocessors or restrict data processing to specific jurisdictions.
Red Flags
- Subprocessor list is not publicly available.
- Subprocessors include entities in jurisdictions with weak data protection.
- No notification mechanism for subprocessor changes.
- Provider cannot specify which data elements each subprocessor accesses.
- Annotation or safety-review subprocessors are contract labor agencies with limited vetting.
Scoring Criteria
| Score | Criteria |
|---|---|
| 5 | Minimal subprocessors. All in privacy-adequate jurisdictions. 30-day advance notification of changes. Objection rights. Data element mapping per subprocessor available. |
| 4 | Published subprocessor list. Notification of changes. Most subprocessors in adequate jurisdictions. |
| 3 | Subprocessor list available on request. Change notifications via blog or email. Some subprocessors in non-adequate jurisdictions. |
| 2 | Subprocessor list incomplete or outdated. No change notification. Jurisdictional exposure unclear. |
| 1 | No subprocessor transparency. Unknown data flows to unknown entities in unknown jurisdictions. |
Domain 6: Security Architecture
Privacy without security is a promise without enforcement. The provider’s security architecture determines whether their privacy commitments are technically achievable.
Questions to Ask
1. How is data encrypted in transit? Minimum expectation is TLS 1.2+. Better is TLS 1.3 with forward secrecy. Best is mutual TLS (mTLS) for API access.
2. How is data encrypted at rest? Server-side encryption (SSE) with provider-managed keys is the minimum. Customer-managed encryption keys (CMEK) are better. Client-side encryption where the provider never holds keys is best – but rare among AI providers because it conflicts with their need to process plaintext for inference.
3. How is data isolated between customers? In a multi-tenant AI inference system, customer data should be isolated in memory, in processing, and in storage. Ask whether the provider uses dedicated compute instances, shared instances with memory isolation, or shared instances with only logical (software) isolation.
4. What is the inference environment? Does the model run in a standard cloud VM, a container, a V8 isolate, or a Trusted Execution Environment (TEE)? TEEs provide hardware-enforced isolation that prevents even the provider from accessing data during processing. V8 isolates (as used by Cloudflare Workers) provide strong process-level isolation with no disk I/O.
5. Is there a bug bounty program? A mature bug bounty program indicates confidence in the security architecture and provides continuous external security testing.
6. Has the provider been breached? Check for disclosed incidents, breach reports, and regulatory actions. A breach history is not automatically disqualifying – the response quality matters more than the incident itself. Providers that disclose promptly, remediate thoroughly, and publish post-mortems demonstrate security maturity.
Scoring Criteria
| Score | Criteria |
|---|---|
| 5 | TLS 1.3 + mTLS. TEE or equivalent hardware isolation. CMEK available. SOC 2 Type II + ISO 27001. Active bug bounty. No material breaches, or breaches with exemplary response. |
| 4 | TLS 1.2+. Strong multi-tenant isolation. SSE with provider-managed keys. SOC 2 Type II. Bug bounty program. |
| 3 | TLS 1.2+. Standard cloud security. SSE. SOC 2 Type I. No bug bounty. |
| 2 | Basic TLS. Limited isolation documentation. Encryption posture unclear. No independent audit. |
| 1 | Security architecture undocumented. No certifications. History of unaddressed vulnerabilities. |
Domain 7: Incident Response
The final domain addresses what happens when things go wrong – because they will.
Questions to Ask
1. What is the breach notification timeline? GDPR mandates notification to supervisory authorities within 72 hours and to affected individuals without undue delay. Your DPA should specify notification to you (as the data controller) within a defined timeframe. Best practice is 24-48 hours for notification to enterprise customers.
2. What constitutes a reportable incident? The definition matters. Does the provider report unauthorized access to systems (even if customer data wasn’t confirmed accessed)? Or only confirmed exfiltration of customer data? A narrow definition means you may not learn about near-misses that indicate systemic risk.
3. What is the incident response process? Request the provider’s incident response plan or a summary of it. Key elements: who leads the response, how is evidence preserved, how are customers notified, and what remediation is provided.
4. Has the provider had incidents, and how did they respond? Transparency about past incidents is a positive signal. Ask directly. A provider that says “we’ve never had an incident” is either lying or hasn’t been looking.
5. What compensation or remediation is available after an incident? Some enterprise agreements include service credits, early termination rights, or indemnification in the event of a breach. These should be negotiated before an incident, not after.
Scoring Criteria
| Score | Criteria |
|---|---|
| 5 | 24-hour customer notification. Broad incident definition. Published incident response plan. Transparent history. Contractual remediation rights. |
| 4 | 48-72 hour notification. Reasonable incident definition. Incident response plan available on request. Post-mortem publication for past incidents. |
| 3 | 72-hour notification per GDPR. Standard incident definition. Process exists but not detailed to customers. |
| 2 | Notification timeline exceeds 72 hours or is undefined. Narrow incident definition. No published incidents (likely underreporting). |
| 1 | No incident response commitments. No transparency about past incidents. No contractual remediation. |
The Composite Scoring Rubric
Weight each domain according to your organization’s priorities. A suggested weighting for most organizations:
| Domain | Weight | Your Score (1-5) | Weighted Score |
|---|---|---|---|
| Data Training Practices | 25% | ___ | ___ |
| Data Retention | 20% | ___ | ___ |
| Access Controls | 15% | ___ | ___ |
| Contractual Protections | 15% | ___ | ___ |
| Subprocessor Exposure | 10% | ___ | ___ |
| Security Architecture | 10% | ___ | ___ |
| Incident Response | 5% | ___ | ___ |
| Total | 100% | ___/5.00 |
Interpretation
- 4.0-5.0: Strong privacy posture. Suitable for sensitive data with ongoing monitoring.
- 3.0-3.9: Adequate with gaps. Acceptable for non-sensitive workloads. Requires supplementary controls (PII stripping, client-side encryption) for sensitive data.
- 2.0-2.9: Significant deficiencies. Not recommended for any data you cannot afford to have exposed. Use only with aggressive client-side protections.
- Below 2.0: Unacceptable. Do not use for any organizational data.
For a current assessment of major providers using this framework, see the AI Provider Privacy Scoreboard.
Conducting the Audit: Process
Step 1: Document Collection (Week 1)
Request the following from each provider under evaluation:
- Data Processing Agreement (DPA)
- Terms of Service (current version with effective date)
- Privacy Policy
- Subprocessor list
- SOC 2 Type II report or ISO 27001 certificate
- Security whitepaper or architecture documentation
- Incident response plan summary
- Data flow diagram (where your data goes, what systems touch it)
If any document is unavailable, note it. Unavailability is itself a data point.
Step 2: Document Analysis (Week 2)
Review each document against the questions in each domain above. Flag ambiguities, contradictions, and missing commitments. Pay particular attention to conflicts between marketing materials and contractual language – marketing materials are aspirational, contracts are binding.
Step 3: Provider Engagement (Week 3)
Send your questions to the provider’s security or privacy team. Evaluate the quality and specificity of their responses. A provider that responds to detailed security questions with marketing materials is signaling that their privacy function is immature.
Enterprise providers should be willing to schedule a security review call. If they refuse, factor that into your assessment.
Step 4: Scoring and Decision (Week 4)
Complete the scoring rubric for each provider. Compare scores. Factor in organizational context: a provider with a lower score might be acceptable if your data is pre-sanitized through PII stripping and the cost savings are significant.
Document your decision, including the rationale and any compensating controls you’re implementing for identified gaps. This documentation protects you if the decision is later questioned – by regulators, auditors, or in the aftermath of an incident.
Ongoing Monitoring
An audit is a point-in-time assessment. Provider practices change. DPAs are updated. Subprocessors are added. Breaches occur. Your audit must be maintained.
Quarterly: Check for DPA or ToS updates. Review the subprocessor list for changes. Monitor security advisories and breach disclosures.
Annually: Re-run the full audit. Re-score all domains. Compare to the previous year’s assessment to identify trends.
On trigger events: Re-audit immediately if the provider announces a breach, changes ownership, enters a new jurisdiction, or makes material changes to its data practices.
The Architectural Alternative
This audit framework accepts the premise that your data must reach the AI provider in plaintext. That premise is the source of every risk evaluated above. If the provider never sees your data – if prompts are sanitized, encrypted, and processed through zero-knowledge infrastructure – then most of these domains become moot. You cannot misuse data you never received. You cannot retain data you never held. You cannot be compelled to produce data you cannot decrypt.
This is the architecture Stealth Cloud is building: an intermediary layer where PII is stripped before prompts leave the client, where encryption keys exist only on the user’s device, and where zero-persistence infrastructure ensures that even the intermediary holds nothing worth auditing.
Until that architecture is universally available, the audit framework above is your best tool for making informed decisions about which providers deserve your data – and under what conditions. Use it. Document it. And re-evaluate it regularly, because the providers certainly will not do it for you.