Technical Architecture

Why Your AI Analytics Tool Doesn't Know Your Business

49% of organizations don't trust AI-generated insights. The reason isn't the model - it's that the AI has never been told what your columns actually mean. Here's how to fix it.

Mihir Sanchala·1 month ago·7 min read

According to a 2026 survey of 114 data and analytics leaders by insightsoftware, 49% of organizations do not trust AI-generated insights. Nearly half. And this is after years of AI analytics investment.

The trust gap is not a model problem. The models are capable. The gap is an architecture problem - and it starts with a question most teams never ask when they deploy an AI analytics tool: what does the AI actually know about your data?

TL;DR: 49% of organisations don't trust AI-generated analytics output. The cause isn't the model - it's that the AI has never been told what your columns mean. A governed semantic layer, validated by humans who know the data, is the only architecture that fixes this.

Key Takeaways

49% of organizations don't trust AI-generated analytics output - the root cause is missing business context, not model capability
AI models generate queries from column names alone when no semantic layer exists - ambiguous names produce inconsistent answers
An AI-generated semantic layer gives the model a starting point, but human validation is what makes answers trustworthy
Every answer should show the query that ran - if it doesn't, you cannot verify what the AI actually did

What Does the AI Actually See When It Reads Your Database?

When an AI analytics tool connects to your database, it reads your schema. Table names. Column names. Data types. Relationships between tables where they are declared.

That is everything it knows about your business.

It does not know that rev_net in your accounting system means revenue net of refunds and chargebacks, not net of tax. It does not know that cust_seg_flag uses a two-character code your sales team invented in 2019 that maps to six customer tiers in a lookup table nobody has documented. It does not know that ord_created_dt is the order creation timestamp in UTC+5:30, not UTC, and that your finance team has a standing rule to exclude same-day orders from weekly cohort calculations.

This is what most real production databases look like. Column names that carry institutional knowledge - knowledge that lives in the heads of the three people who built the original schema, not in the schema itself.

The AI can read every column in your database. It has no idea what any of them mean in your specific business context. That distinction is the difference between an answer that is probably right and an answer you can defend.

When a business user asks "what was our net revenue last week?", the AI translates that question into a query against your schema. Which column it picks for "net revenue" depends entirely on what it can infer from the column name. If the name is ambiguous - and in most production databases, it is - the model picks the most statistically probable interpretation. That interpretation can and does change depending on how the question is phrased.

This is why two people asking the same question get different answers. Not because the data is different. Because the AI made a different inference from the same ambiguous column name. A February 2026 accuracy review of AI data analyst tools by Kaelio identified schema ambiguity as the primary accuracy failure mode across all tested tools - where column names carry institutional knowledge the model cannot infer, answer consistency breaks down regardless of model capability.

Why Can't the AI Model Fix This on Its Own?

The instinct when AI analytics gives wrong answers is to prompt-engineer your way out. Add more context to the question. Be more specific. Specify which column. Specify which table.

This does not scale and it does not solve the underlying problem.

When a business user has to specify the column name in their question, the tool has failed at its core purpose. "What was our net revenue last week using the rev_net column excluding same-day orders?" is not self-serve analytics. It is SQL with a natural language wrapper.

The model cannot fix this on its own because the problem is not in the model. It is in the absence of a governed layer between the question and the schema. Without that layer, the model is making the best inference it can from incomplete information. Better models make better inferences. But no model can infer context that has never been provided. The deeper reason is architectural - why one AI model isn't enough for trustworthy analytics covers the two-stage pipeline that separates intent classification from query synthesis.

According to the same insightsoftware 2026 survey, 53% of data leaders cite audit trails for AI-generated answers as a top governance need. The demand is not for smarter AI. It is for AI whose reasoning can be verified. That is a fundamentally different requirement - and it starts at the semantic layer, not the model layer.

What Does a Governed Semantic Layer Actually Do?

A semantic layer is not documentation. It is not a glossary in a wiki that nobody reads. It is a validated set of machine-readable descriptions, locked into the data pipeline, that the AI reasons from before it generates any query.

For every column the AI has access to, the semantic layer answers three questions:

What does this column contain? Not the column name - the actual definition. rev_net contains revenue after refunds and chargebacks have been applied, before tax deductions. Values are in USD. Populated daily by the accounting ETL job.

What business rule applies? Same-day orders are excluded from weekly cohort calculations per finance policy effective March 2023. This applies to any query involving weekly revenue aggregation.

What does this column mean in context? In the context of growth reporting, cust_seg_flag = 'EN' means Enterprise tier, minimum contract value $50,000 ARR. In the context of support reporting, cust_seg_flag is not used - use support_tier instead.

When the AI generates a query from this, it is not inferring. It is reasoning from governed definitions that a human who knows the business has validated. The query it produces is the query your data team would have written.

That is the architecture difference between AI analytics that produces inconsistent answers and AI analytics that produces answers you can defend in a board meeting. To see what that defense actually looks like - the visible query, the confidence level, the Decision Intelligence panel - this walkthrough shows a verified answer end to end.

How Does AI Generate the Layer - and Why Do Humans Have to Validate It?

Generating descriptions for every column in a large database manually is not realistic. A production database with 40 tables and 300 columns would take weeks to document properly - and by the time it is done, three tables have changed.

This is where AI-assisted semantic layer generation is genuinely useful. The model reads the schema, samples representative values, and drafts a description for every column. It is fast. For straightforward columns - IDs, timestamps, status flags with clear value patterns - it is often accurate enough to be a good starting point.

But "good starting point" is not the same as "validated."

The AI drafts from pattern recognition. It does not know your finance team's exclusion rules. It does not know the historical context behind a column that was renamed in 2021. It does not know that ord_status = 4 means "fulfilled" in your legacy system but "pending shipment" in the new one.

A human who knows the data has to confirm each description before the AI can reason from it with confidence. That confirmation step is what separates a semantic layer that makes answers trustworthy from one that makes them confidently wrong.

Anthropic's own engineering team reached the same conclusion building their internal analytics system on Claude: auto-generated semantic definitions "encoded the very ambiguities we were trying to eliminate." Without a structured, human-validated semantic layer, their accuracy on analytics queries didn't exceed 21%. With one, it reached 95%+ - the full architecture is documented here.

At Edilitics, this is surfaced as two separate signals, not one combined gate. Data Quality measures whether the data itself is sound - completeness, uniqueness, type compliance. The AIR Score measures something different: whether each column's meaning has actually been validated, not just guessed at by an AI reading column names. A dataset can have perfect data quality and still score poorly on AIR if nobody has confirmed what the columns mean - and AskEdi tells you which one is the problem, because they fail differently. Low data quality risks unreliable or hallucinated answers. Low AIR means the AI is mostly guessing at business context from names alone. Neither blocks AskEdi from answering. Both visibly lower how much you should trust the answer it gives.

An integration at Grade A on both has clean data and fully validated descriptions. An integration at Grade D on AIR has clean data but no validated descriptions - AskEdi will still answer, but it's reasoning from guesses about what your columns mean, not confirmed definitions. Both produce answers. Only one produces answers you can trust without checking.

What Goes Into the AIR Score - and What Does the AI See While It Drafts?

"AI Readiness" is not a vibe. It is two scores, weighted and combined - the full mechanics are documented in how the AIR score is calculated.

Edilitics Score Breakdown drawer showing DQ score with Completeness, Uniqueness, and Type Compliance, and AIR score with Human Validation percentage — The Edilitics Score Breakdown drawer - DQ and AIR scores per integration, updated on every profile run.

The first half is data quality. Every column is profiled for completeness, uniqueness, and type compliance, then graded A through F using a weighted formula: completeness counts for 50% of the score, uniqueness for 25%, compliance for 25%. ID and timestamp columns are weighted three times heavier than other columns in this calculation - because a corrupted primary key or a broken date field breaks every query that touches it, not just one.

The second half is semantic health - how well each column is described for the AI to reason from. The AIR score combines these two halves equally: 50% data quality, 50% semantic documentation. A column can have perfect data quality and still cap the integration's AIR grade if nobody has validated what that column means.

That second half is where the privacy mode you choose for metadata generation matters - and it is worth being precise that this is a different setting from the privacy mode AskEdi uses when answering a question. Privacy and context modes behave differently per module: one controls what the AI sees while drafting a column description, before any business user has asked anything. The other controls what AskEdi sends to the model while resolving a live query. They are separate decisions, made at different points in the pipeline. The accuracy tradeoff that comes with AskEdi's own Private mode - and why it depends on the same column descriptions discussed here - is covered in anonymizing your data isn't free.

For metadata generation specifically, there are three modes - and in all three, zero raw data rows are ever transmitted:

Private - sends table names, column names, data types, and DQ statistics. Does not send your organisation's Focus Sector or any value-level data. The draft description is structurally accurate but has no industry context - useful when column names themselves carry sensitive meaning you want to minimise.

Balanced - sends everything in Private, plus your organisation's configured Focus Sector. A column named amt in a Finance workspace gets described as a transaction or revenue measure. The same column name in a Healthcare workspace gets described as a dosage or billing amount. Still no value-level data leaves your environment - the difference is industry awareness, not column visibility.

Full Context - sends everything in Balanced, plus the most frequently occurring values per column, drawn from the DQ profiling sample. If a column named status has frequent values like pending_review, escalated, and resolved, Full Context lets the draft description use that actual vocabulary instead of guessing at what a generic "status" column might contain. The frequent values are statistical summaries from the profiling sample, not individual records.

More signal at draft time does not mean less governance. It means a better starting point for the human who still has to confirm what the column means. Full Context narrows the gap between the AI's first guess and the validated answer - it does not skip the validation step.

This is also why the privacy mode chosen for metadata generation has a downstream effect on query scoping, not just on the description's wording. A description drafted in Full Context mode that correctly names a status column's actual values - pending_review, escalated, resolved - gives the Classifier real vocabulary to match against when a business user later asks about "escalated cases," rather than the model inferring a mapping it was never told. A description drafted in Private mode has no value-level signal to offer at that point - it can describe the column's structure, not its contents. The privacy mode is a tradeoff between how much the AI sees up front and how much inference it has to do later, every time someone asks a question.

What "Verifiable" Actually Means

The other half of the trust problem - beyond context - is verifiability.

Even if the AI has a complete semantic layer and generates the right query, the business user has no way to know that without seeing the query. They are being asked to trust the output of a system whose reasoning is invisible to them.

This is why the answer alone is not enough. Every AI analytics answer should show the exact query that ran. Not as a log buried in a settings panel - as a first-class part of the answer interface. The business user sees the answer, sees the query, and can verify that the query matches their intent before acting on the result.

AskEdi makes the SQL or aggregation query available on every answer via the Analysis View - one click on the code icon in the action bar. Not because users always check it - most don't. But because the option to verify is what makes the answer trustworthy. An answer you cannot check is an answer you cannot defend. An answer with a visible, auditable query behind it is one a CFO can take to a board meeting.

What Architecture Decision Actually Determines Trust?

Every AI analytics tool makes an architectural choice about where business context lives.

The first choice: connect the AI directly to the raw schema and let the model infer context from column names. Fast to deploy. Zero setup. And the answers are only as good as the model's inferences - which means they are inconsistent, hard to reproduce, and impossible to verify in any meaningful way.

The second choice: build a governed semantic layer between the AI and the schema. Slower to set up. Requires human validation of column descriptions. And the answers are grounded in validated business definitions that produce consistent, auditable results.

The 49% of organizations that don't trust AI analytics output are mostly running the first architecture. The fix is not a different model. It is a different layer - one that gives the AI the business context it needs before it is ever asked a question.

That is the difference between an AI analytics tool that knows your database and one that actually knows your business.

Integrate builds the DQ-scored, AIR-graded semantic foundation AskEdi reasons from. AskEdi returns verified, decision-ready answers with the query visible for anyone who needs to check.

Start a free 14-day evaluation.

Sources

insightsoftware, Why Don't Data Leaders Trust AI? And Other Insights From Our 2026 AI Survey, 2026. Retrieved June 11, 2026.
Kaelio, How Accurate Are AI Data Analyst Tools?, February 2026. Retrieved June 11, 2026.

Written by

Mihir Sanchala

Co-Founder, Edilitics. Engineers the systems that bring Edilitics to production. Writes about the technical reality of building governed AI analytics - the decisions, the tradeoffs, and what building for trust actually requires.

Connect on LinkedIn

Product + Proof

Decision Intelligence Told Me What to Do. It Couldn't Tell Me Why.

Most AI analytics tools tell you what to do. AskEdi now tells you why the problem exists and what changes when you have a diagnosis before the recommendation.

7 min read

Founder Journey

The 90/10 Problem Nobody in Data Talks About

A decade in retail and growth taught me how to read a business. It didn't give me access to my own data. That gap became Edilitics.

7 min read

Product + Proof

Anthropic Published How They Built AI Analytics. Here's What We Found.

Anthropic's engineering team published the internal architecture they built to make AI analytics trustworthy. The four decisions they made are the same four decisions Edilitics is built on.

8 min read