Field notes · AI Security

ChatGPT Data Security Risks Are a Distraction

You're worried about OpenAI training on your data. You should be worried about the sensitive data your employees are carelessly feeding it every day.

Tim O'Neal · May 27, 2026 · 7 min read
ChatGPT Data Security Risks Are a Distraction

Let’s be blunt: obsessing over whether OpenAI trains its models on your ChatGPT prompts is missing the forest for the trees. The real, immediate, and costly ChatGPT data security risk isn’t what a model might learn tomorrow; it’s the sensitive data your employees are copy-pasting into it right now. Every day, teams feed proprietary code, unreleased financials, and private customer PII into public-facing AI tools in the name of productivity. And most organizations are doing nothing to stop it.

The conversation around AI safety in the enterprise has been dominated by a theoretical, long-term fear of models absorbing proprietary information. This has led to a flurry of misguided security policies—outright bans that employees immediately circumvent or privacy settings that provide a false sense of security. These measures fail because they target the wrong problem. The foundational risk isn’t the AI; it’s the chaotic, ungoverned data environment that AI now plugs into.

When an employee pastes a chunk of a sensitive document into ChatGPT, the damage is already done. It has left your secure environment. It’s now in the hands of a third-party service with its own terms, its own security vulnerabilities, and its own potential for misuse. Before you can win the LLM wars, you have to get your own house in order. It’s time to look past the prompt and fix the problem at its source: your data.

The Training Data Opt-Out Is a Red Herring

Much of the early panic surrounding ChatGPT in business centered on its data retention policy. By default, OpenAI uses conversation history to train future models. This created a nightmare scenario for any company with intellectual property: your confidential strategies, code, or customer data could hypothetically be absorbed by the model and later regurgitated to a competitor.

OpenAI responded by offering an opt-out feature and a ChatGPT Team subscription that promises not to train on business data. Problem solved, right? Not even close. These measures create a dangerous level of complacency. They convince leadership they’ve addressed the AI risk, while ignoring the much larger, more immediate threats:

  • Data in Transit: Opting out of training doesn't change the fact that your data is still being processed on third-party servers. It travels outside your firewall, across the public internet, to an external vendor. The moment an employee hits “enter,” you’ve lost control.
  • Accidental Exposure: The most common vector for a data leak is simple human error. An employee, trying to quickly summarize a sensitive report, pastes the entire thing into a public LLM. They get their summary, but a copy of that full report is now logged in their chat history, accessible via their account, and potentially vulnerable to a breach of that third-party service.
  • Lack of Oversight: When employees use personal or unmanaged accounts for work, security teams have zero visibility. You don’t know who is using it, what they’re inputting, or how often. It’s the definition of shadow IT, now supercharged with AI.

Focusing on the training policy is like worrying about a leaky faucet when a pipe has burst. It’s a valid concern, but it distracts from the flood of unsecured data leaving your organization every minute.

Your Data Is the Real "Upstream" Risk

The biggest security risk isn’t what you type into ChatGPT; it’s the sensitive data already sitting in your SaaS tools. Think about the terabytes of information sprawling across Google Drive, Slack, Jira, SharePoint, and a dozen other platforms. This is the data that forms the lifeblood of your company—and it’s the same data employees are now using to fuel their prompts.

According to research from Cyberhaven released in late 2025, a staggering 34.8% of what employees paste into ChatGPT is sensitive data. That’s a sharp rise from just 11% in 2023, showing how quickly the behavior has normalized. This isn’t malicious; it’s a side effect of convenience. An engineer wants to debug a code snippet. A marketer wants to rephrase a paragraph from a confidential press release. A sales lead wants to generate an email based on notes from a private customer call. In their workflow, the fastest way to the answer is to copy, paste, and prompt.

From a CISO’s perspective, ChatGPT and other public LLMs act as an accelerant for your existing data governance weaknesses. As one 2026 report from Metomic notes, these AI tools simply amplify the risks of overshared files and inadequate permissioning that have plagued enterprises for years. If your Google Drive is a mess of publicly shared links and folders where anyone can access anything, you can’t be surprised when that data ends up in an AI prompt. The problem starts *upstream*.

Fixing this requires a fundamental shift in focus from blocking AI to controlling data. You must be able to answer: What sensitive data do we have? Where is it? Who can access it? And who is trying to move it outside our secure environment? Without answers to these questions, any AI security policy is pure theater.

Beyond Leaks: Prompt Injection and Malicious Code

While data leakage is the most common and clear-cut danger, it’s not the only one. The very nature of large language models opens up novel attack vectors that most security teams are unprepared for.

Prompt Injection Attacks

Prompt injection is an attack where a malicious actor crafts an input designed to trick the LLM into ignoring its safety protocols. As described by SentinelOne, these attacks can coerce a model into leaking confidential data it has been fine-tuned on, generating malware, or bypassing content filters. For example, a cleverly worded prompt might convince an AI assistant to reveal sensitive system instructions or user data from a previous conversation.

This is particularly dangerous for companies building their own internal chatbots or agents on top of models like GPT. If your internal "Sales Copilot" is connected to your CRM, a successful prompt injection attack could potentially trick it into outputting customer lists or contract details to an unauthorized user.

Insecure Code Generation

One of the most celebrated use cases for LLMs is their ability to write and debug code. Developers are using ChatGPT daily to speed up their work. The risk, however, is that these models have no inherent concept of security. They are trained on vast amounts of public code from sources like GitHub—code that is often outdated, flawed, or outright vulnerable.

An inexperienced developer might ask an LLM to generate a function for, say, handling user file uploads. The AI could produce a perfectly functional but insecure piece of code that doesn’t properly sanitize filenames, opening the door to a path traversal vulnerability. Without rigorous code review and security expertise, teams can easily end up embedding AI-generated vulnerabilities directly into their applications, creating a massive, hidden attack surface.

The Sprawl of "Shadow AI"

The final, critical risk is organizational, not technical. Your company isn’t just dealing with ChatGPT. It’s dealing with dozens of AI-powered tools, services, and browser extensions that employees are adopting on their own. This "Shadow AI" ecosystem means your data isn’t just going to OpenAI; it could be going to a small, unknown startup with questionable security practices.

Without a centralized platform for managing AI access, you have no control. Employees sign up for services with their personal or work emails, and IT has no visibility or ability to enforce policy. You can’t control which models are used, how they are configured, or what data is shared with them. This sprawl makes a cohesive security strategy impossible. It guarantees that, eventually, sensitive data will end up somewhere it shouldn’t be.

This is precisely why a unified access layer is no longer a luxury but a necessity. Platforms like Backplain give businesses a single, secure gateway to every major LLM—GPT, Claude, Gemini, Llama, and more. Instead of a chaotic free-for-all, all activity is funneled through one workspace. This allows you to enforce consistent security controls, audit usage, and prevent sensitive data from ever leaving your environment, regardless of which model an employee chooses to use. You regain control without sacrificing the productivity benefits your teams demand.

The risks posed by tools like ChatGPT are real, but they are also manageable. The key is to stop focusing on the tool and start focusing on the data and workflows that feed it. Secure your data upstream, control access through a unified platform, and educate your teams on responsible use. Don't let the hype or the fear distract you from where the real work of enterprise AI security needs to happen.

Backplain gives enterprise teams a secure, unified workspace across every leading LLM — without sending sensitive data to public AI. Talk to us about deploying it for your team.

Related field notes
LLM Wars (and how to make sure you win) — Multi-model
Multi-model · May 4, 2026

LLM Wars (and how to make sure you win)

Since the dawn of the digital age there have been battles that span decades fought for the betterment of consumers. At times there were clear victors; VHS over Betamax, Blu-ray over HD-DVD (that one hurt). There were others where there wasn’t a clear victor or the war is still wa

How to Prompt — Productivity
Productivity · May 4, 2026

How to Prompt

Not all of us have kept up with the generative AI trend well enough to know how to effectively craft a prompt to get the desired result. Even those of us who HAVE been keeping up with the LLMs find ourselves continuously improving what and how we ask the LLMs to respond. There is

Thoughts on 2024 AI Predictions - Part 1 — General AI
General AI · May 4, 2026

Thoughts on 2024 AI Predictions - Part 1

I am starting a series of blog posts that will take several predictions that have been made about AI for 2024 and work to understand whether I think they will come to fruition, what they would look like if they did, and in general try to use the science fiction writer part of my