The Samsung Incident: What Happened When Engineers Pasted Source Code Into ChatGPT

In April 2023, Samsung semiconductor engineers leaked proprietary source code, test sequences, and internal meeting notes into ChatGPT. The incident became a watershed moment for enterprise AI privacy.

On March 30, 2023 – less than three weeks after Samsung Semiconductor lifted its internal ban on ChatGPT and allowed engineers to use the tool – three separate incidents of confidential data leakage occurred. Engineers at Samsung’s Device Solutions division in Hwaseong, South Korea, pasted proprietary source code for semiconductor equipment diagnostics, internal test sequences for chip identification, and the transcript of a confidential business meeting directly into ChatGPT prompts.

Within twenty days of permitting AI tool usage, one of the world’s largest technology conglomerates had inadvertently donated proprietary intellectual property to OpenAI’s training pipeline. The incidents, first reported by South Korean media outlet Economist Korea and subsequently confirmed by Samsung, triggered an industry-wide reckoning with a question most enterprises had been ignoring: what happens when your trade secrets enter a system you don’t control?

The Three Incidents

The Samsung leak wasn’t a single event but three discrete failures, each illustrating a different dimension of enterprise AI risk.

Incident 1: Semiconductor Source Code

An engineer in Samsung’s semiconductor division encountered a bug in source code related to equipment measurement and diagnostics. Rather than debugging through internal tools, the engineer pasted the proprietary code directly into ChatGPT and asked it to identify and fix the problem. The code related to Samsung’s semiconductor fabrication processes – among the most closely guarded trade secrets in the global technology industry, where nanometer-level manufacturing advantages translate to billions in revenue.

Incident 2: Yield and Test Data

A second engineer used ChatGPT to optimize test sequences for semiconductor manufacturing. The prompts included proprietary data about chip yield rates and testing procedures – information that competitors would pay enormous sums to access. Semiconductor yield data is considered one of the most sensitive categories of information in the chip industry because it reveals both manufacturing capabilities and current performance limitations.

Incident 3: Meeting Transcript

A third employee converted a recorded internal meeting into text and submitted the full transcript to ChatGPT, requesting a summary for meeting minutes. The meeting content included strategic discussions and internal deliberations – the kind of candid corporate dialogue that is typically protected by the most stringent confidentiality measures.

In each case, the employee was attempting to use AI productively. None intended to leak confidential information. The data breaches were not malicious acts but workflow optimizations that collided with an architecture designed to capture and retain everything users submit.

Samsung’s Response

Samsung’s reaction was swift and, within the corporate world, relatively transparent.

Immediate restrictions: Samsung imposed a 1,024-byte limit on ChatGPT prompt length, severely restricting the tool’s utility for code-related tasks. This was a blunt instrument – like solving a fire hazard by limiting the building to one room – but it was implementable immediately.

Internal investigation: Samsung’s semiconductor division launched an investigation into the scope of the exposure. The investigation reportedly found that the three known incidents were likely not isolated, and that broader usage patterns suggested additional undetected leakage events.

AI tool development: Samsung announced it would develop an internal AI tool for employee use, built on models running within Samsung’s own infrastructure. This “Samsung AI” approach – essentially self-hosted AI – would allow the company to capture the productivity benefits of large language models without routing proprietary data through third-party systems.

Policy overhaul: Samsung implemented mandatory AI usage training and established clear policies delineating what categories of information could and could not be submitted to external AI tools. The company warned employees that further violations could result in termination.

Eventual ban: By May 2023, Samsung had banned the use of ChatGPT and other external generative AI tools on company-owned devices and internal networks entirely. The company that had lifted its ban just weeks earlier reversed course completely.

Why the Training Pipeline Matters

The Samsung incident’s severity stems not from the immediate disclosure of data to OpenAI (a single company bound by its own confidentiality obligations and privacy policy) but from the downstream implications of that data entering a training pipeline.

Under OpenAI’s default settings at the time of the incidents, user conversations were eligible for use in model training. This means Samsung’s proprietary code, test data, and meeting content potentially entered the dataset used to improve future versions of GPT. The implications cascade:

Memorization risk: As research on model memorization demonstrates, LLMs can memorize and later regurgitate specific training examples. Samsung’s source code fragments could, in theory, appear in responses to other users who prompt the model about semiconductor diagnostics or testing procedures. The probability of verbatim extraction is low for any single training example, but it is nonzero – and for a company whose competitive advantage depends on manufacturing process secrets, nonzero is unacceptable.

Competitive intelligence: Any entity with sufficient knowledge of Samsung’s technology stack could craft prompts designed to probe for memorized content related to semiconductor manufacturing. This represents a novel form of corporate AI espionage – using a shared AI system as an indirect channel for extracting competitor trade secrets.

Irreversibility: Even after Samsung reported the incidents to OpenAI, the opt-out problem applies in full force. If the data had already been used in a training run, it cannot be un-trained. Deleting conversation logs doesn’t reverse gradient updates. Samsung’s proprietary information may persist in model weights indefinitely.

The Industry Shock Wave

The Samsung incident functioned as a proof-of-concept for a risk that security professionals had been warning about since ChatGPT’s launch. The corporate response was immediate and widespread:

JPMorgan Chase restricted employee use of ChatGPT, citing concerns about sharing confidential financial data with third parties. The bank had already been monitoring employee AI usage and found patterns similar to Samsung’s.

Amazon warned employees in January 2023 (before the Samsung incident) not to share confidential information with ChatGPT after detecting instances where model responses closely resembled existing Amazon internal content – suggesting either training data overlap or concerning coincidence.

Apple restricted use of ChatGPT and GitHub Copilot internally, concerned about both source code leakage and the potential for AI tools trained on Apple code to surface proprietary information to competitors.

Verizon, Deutsche Bank, Goldman Sachs, Citigroup, and numerous other major corporations implemented restrictions ranging from outright bans to usage guidelines with mandatory approval workflows.

A May 2023 survey by Fishbowl (a professional social network) found that 68% of employees using ChatGPT at work did so without their employer’s knowledge. The Samsung incident was not an anomaly – it was the first visible instance of a pattern occurring at scale across every industry.

Quantifying the Enterprise Risk

The Samsung incident prompted several research efforts to quantify the scope of enterprise data leakage through AI tools:

Cyberhaven’s data analysis (covering 1.6 million workers at companies using their data loss prevention platform) found that by March 2024, 4.7% of workers had pasted company data into AI tools at least once. Of the data submitted, 11% was classified as confidential. The volume of sensitive data pasted into AI tools increased 485% year-over-year.

A 2024 report by LayerX Security analyzed browser-based AI usage across enterprise environments and found that 6% of employees had pasted sensitive data into AI tools, with 4% doing so on a recurring basis. The most common categories of leaked data were source code (31%), internal business data (43%), and customer data (12%).

Gartner projected that by 2025, generative AI would be a factor in at least 15% of corporate data breach incidents – up from near zero in 2022. The Samsung incident was an early indicator of this trend, not an outlier.

The Architectural Failure

The Samsung incident reveals an architectural failure, not just a policy failure. Samsung had policies against sharing confidential information externally. Those policies failed because the AI tool’s interface made data sharing the path of least resistance.

The core problem: centralized AI architectures create a single point of data aggregation that is simultaneously the most useful tool in an employee’s workflow and the most dangerous channel for data exfiltration. The incentive to use the tool (productivity gains) always overwhelms the incentive to protect data (abstract risk of future leakage) in moment-to-moment employee decisions.

Policy-based approaches – training programs, usage guidelines, approval workflows – address the symptom, not the cause. They rely on perfect human compliance across every interaction, every day, for every employee. The Samsung incident demonstrates that this expectation is unrealistic. Three separate employees, working independently, all made the same judgment call: the productivity benefit was worth the theoretical risk.

The architectural solution is to remove the risk from the judgment call entirely. If the AI infrastructure cannot access proprietary data in plaintext – if PII stripping and encryption occur before data leaves the corporate environment – then employee behavior becomes irrelevant to data security. The system protects the organization even when individual users make suboptimal choices.

Lessons for Enterprise AI Strategy

The Samsung incident distills into several actionable lessons for organizations deploying AI tools:

1. Bans Don’t Scale

Samsung’s initial response – banning external AI tools – is tactically sound but strategically unsustainable. Employees who have experienced a 10x productivity gain from AI tools will find workarounds: personal devices, personal accounts, alternative AI services. A 2024 survey by Salesforce found that 55% of employees using AI at work were using unapproved tools. Prohibition drives usage underground, where it’s invisible to security teams.

2. DLP Alone Is Insufficient

Data Loss Prevention tools can detect some categories of sensitive data in AI prompts (credit card numbers, Social Security numbers, specific code patterns). But they cannot detect strategic information, meeting content, product roadmaps, or competitive intelligence expressed in natural language. The Samsung meeting transcript would not have triggered any standard DLP pattern.

3. Internal Models Are Necessary but Not Sufficient

Samsung’s decision to build internal AI tools addresses the third-party data sharing risk but introduces new challenges: the compute cost of running large models, the ongoing maintenance burden, and the performance gap between internally hosted models and frontier commercial offerings. Many organizations lack the infrastructure or expertise to maintain competitive internal AI deployments.

4. Architecture Is the Only Reliable Control

The lesson of Samsung is that human behavior cannot be relied upon to protect data in AI workflows. The only reliable approach is architectural: systems that make data exposure technically impossible, regardless of user behavior.

Zero-persistence architecture achieves this by ensuring that data processed through AI infrastructure cannot be retained, logged, or trained on. Cryptographic shredding ensures that even in-memory data is destroyed after processing. And zero-knowledge design ensures that the infrastructure provider has no technical capacity to access user data.

Stealth Cloud implements these principles specifically for AI workloads, providing the productivity benefits of frontier language models with the data protection guarantees that enterprises require. The Samsung engineer who pasted source code into a Stealth Cloud-proxied chat would face zero risk of training data contamination – because the architecture makes contamination impossible.

The Stealth Cloud Perspective

The Samsung incident was preventable, not through better policies or more training, but through better architecture. Three engineers acting in good faith compromised trade secrets worth billions because the system they used was designed to capture and retain everything. Stealth Cloud inverts this design: prompts are processed in encrypted, ephemeral memory with zero persistence and zero training data capture. Samsung’s engineers needed a tool that protects the organization from its own users’ well-intentioned productivity – that tool is architecture, not policy.