Technical Architecture

Anonymizing Your Data Isn't Free. Here's What It Actually Costs.

Anonymizing column names before sending schema to an AI is standard advice. What most teams miss: the AI still has to understand your data somehow, and anonymization changes what that costs.

Mihir Sanchala·1 month ago·7 min read

Most advice about AI and sensitive data stops at the same instruction: anonymize your column names before sending schema to a model. Swap customer_ssn for col_4, send the placeholder, keep the real name out of the request.

That instruction is correct. It is also incomplete.

TL;DR: Anonymizing column names before sending schema to an AI is real protection, not theater - but it isn't free. The AI still has to understand what each field means to answer correctly, and anonymization removes its most direct source of that understanding: the name itself. What replaces it, and how well, determines how much accuracy survives the trade.

Key Takeaways

Anonymizing a column name protects what the AI sees, not what it needs to understand
AskEdi's Private mode sends anonymized identifiers and relies on column descriptions to fill the gap
Balanced and Full Context modes never anonymize at all - the tradeoff is unique to Private mode
The Analysis view shows two queries only in Private mode, because that's the only mode where a rewrite actually happens
Description quality is the lever that determines how much accuracy Private mode keeps

What Does Anonymizing a Column Actually Remove?

A column name is not just a label. It's a piece of context. customer_ssn tells a model what the field contains before it has looked at a single value. col_4 tells it nothing.

When you anonymize column names before sending a schema to an AI, you remove that piece of context on purpose. That's the protection working as intended - the AI should not see that a field is called customer_ssn, credit_score, or churn_risk_flag if those names themselves carry sensitive meaning.

But removing the name doesn't remove the AI's need to understand the field. A business question like "which customers are at risk of churn" still requires the model to know which anonymized column corresponds to churn risk. That understanding has to come from somewhere else now that the name is gone.

Anonymization protects what the AI sees. It does not reduce what the AI needs to understand to answer correctly. Those are two different problems, and most advice about anonymizing data only solves the first one.

Where Does the AI's Understanding Come From Instead?

In AskEdi's Private mode, the AI receives anonymized identifiers - col_1, col_2, col_3 - in place of real column names. Alongside those identifiers, it also receives column descriptions, data types, and data quality statistics like null counts and value ranges.

The column description is doing the heavy lifting here. With the name gone, the description is the AI's primary source of business context for that field. A well-written description - "monthly recurring revenue, net of refunds, in USD" - gives the model something concrete to reason from even without ever seeing the column called mrr_net. A thin or missing description leaves the model working from data type and statistics alone, which is structural information, not business meaning.

This is exactly why Private mode's own documentation states the dependency directly: accuracy in this mode depends on column description quality. That is not a caveat buried in fine print. It's the actual mechanism. The same column, anonymized, produces a more precise answer when its description is complete than when it isn't - because the description is now carrying weight the column name used to carry for free.

This connects directly to the work covered in why your AI analytics tool doesn't know your business: validated column descriptions aren't only about general answer quality. In Private mode specifically, they are the thing standing in for the column name itself. A team that has invested in writing and validating real descriptions pays a smaller accuracy cost for choosing Private mode. A team that hasn't pays a larger one.

Why Don't Balanced and Full Context Modes Have This Tradeoff?

Balanced mode sends real column names to the AI, along with the same data quality statistics Private mode sends. Nothing is anonymized. The AI sees mrr_net as mrr_net. There is no gap for a description to fill, because the name itself is still doing its job.

Full Context mode goes further - same real column names, plus the most frequent values in each column and the result of the previous analysis in the conversation, so follow-up questions build on prior context automatically. The exact breakdown of what each mode sends is in AskEdi's privacy modes documentation.

Neither mode involves a tradeoff between privacy and accuracy, because neither one withholds the column name in the first place. The tradeoff this post is about is specific to Private mode. It exists because Private mode is solving a different problem - keeping column names themselves out of the request, for cases where the names carry sensitive meaning like national_id or credit_score - and that problem has a cost attached to it that Balanced and Full Context never incur.

None of the three modes ever sends raw data rows to the AI. That protection holds constant across every mode. The difference between them is how much structural and semantic context the AI receives beyond that floor, and Private mode is the only one of the three that removes a piece of context (the column name) rather than simply choosing not to add more.

Why Does the Analysis View Show Two Queries Only in Private Mode?

This is where the tradeoff becomes visible, not just theoretical.

AskEdi Analysis view in Private mode showing two query panels side by side: Anthropic Generated Query using anonymized column identifiers col_5, col_6, and col_7, and the AskEdi Processed Query with real column names date, temp_max_c, and temp_min_c substituted in — Private mode is the only mode where the Analysis view shows two queries - what the AI generated against anonymized identifiers, and what AskEdi actually ran against your real columns.

In Balanced and Full Context modes, the AI writes its query directly against your real column names. Whatever it produces is what runs. One query, one panel, because no translation step exists.

In Private mode, the AI writes its query against col_1, col_2, and so on - the only identifiers it ever saw. Before that query runs against your actual source, AskEdi rewrites it, substituting the real column names back in. The Analysis view then shows both versions side by side: the query the AI generated using anonymized identifiers, and the query that actually executed with real names restored.

The dual-pane view in Private mode isn't a UI flourish. It exists because a real rewrite happens in that mode and nowhere else. Showing both queries is the only way to let you verify that the column mapping applied was the correct one.

That rewrite step is the literal mechanism of the tradeoff this post describes. The AI reasoned about col_3 using a description, decided what col_3 should mean in the context of your question, and wrote a query accordingly. AskEdi's job afterward is to make sure col_3 maps back to the column it actually does, not the column the AI assumed it might be. The clearer the description, the less ambiguity exists for that mapping to get right in the first place.

What This Means for Choosing a Privacy Mode

None of this is an argument against Private mode. Some column names genuinely carry sensitive meaning that should never reach an AI provider, regardless of how good your descriptions are. For those cases, Private mode is the correct choice and the accuracy tradeoff is a reasonable price for that protection.

The point is narrower: anonymization is not a free safety toggle. It's a real architectural tradeoff, and the variable that determines how expensive it is sits entirely on your side - in how complete and specific your column descriptions are before you ever ask a question.

A team that has already done the work of validating column descriptions pays very little for choosing Private mode. A team that hasn't is choosing privacy and accuracy loss at the same time, without necessarily realizing the second part was optional.

Why Does the Mode Decision Happen Before You Ask Anything?

Privacy mode is set once, when a chat is created, and it applies to every question asked in that conversation. It cannot be changed mid-chat. To use a different mode, you start a new chat.

This makes the mode choice a decision you have to make deliberately, not something to default into. If a column's name itself is sensitive, decide that before the first question, not after seeing how the first answer turns out. If accuracy on a follow-up question matters more than hiding a column name, that's a reason to start a fresh chat in a different mode rather than continuing in one that's already locked.

There's no way to test your way out of a wrong choice mid-conversation. The only way to compare how a question performs across two modes is to ask it twice, in two separate chats - which means the decision is worth a moment of thought before the first message, not something to discover the cost of three questions in.

Integrate is where column descriptions get written and validated, before any privacy mode decision matters. AskEdi is where that work pays off across all three modes, Private included.

Start a free 14-day evaluation.

Written by

Mihir Sanchala

Co-Founder, Edilitics. Engineers the systems that bring Edilitics to production. Writes about the technical reality of building governed AI analytics - the decisions, the tradeoffs, and what building for trust actually requires.

Connect on LinkedIn

Product + Proof

Decision Intelligence Told Me What to Do. It Couldn't Tell Me Why.

Most AI analytics tools tell you what to do. AskEdi now tells you why the problem exists and what changes when you have a diagnosis before the recommendation.

7 min read

Founder Journey

The 90/10 Problem Nobody in Data Talks About

A decade in retail and growth taught me how to read a business. It didn't give me access to my own data. That gap became Edilitics.

7 min read

Product + Proof

Anthropic Published How They Built AI Analytics. Here's What We Found.

Anthropic's engineering team published the internal architecture they built to make AI analytics trustworthy. The four decisions they made are the same four decisions Edilitics is built on.

8 min read