Data Quality Profiling
Understand your DQ score, what each grade means, how scoring works across columns, tables, and integrations, and how to fix a low score.
Data quality is the foundation of reliable analytics. Before you build a pipeline, run an AI session, or create a visualisation, you need to know whether the underlying data is populated, consistent, and structurally sound. Poor data quality produces misleading results in traditional reporting and incorrect answers in AI analysis.
Every integration in Edilitics is profiled automatically when it is first connected. The profiling engine scans every table, measures every column, and produces a Data Quality (DQ) Score with no manual setup required.
How Scoring Works
The DQ score operates at three levels, each derived from the one below it.
Column Score
Every column in every table receives a score from 0 to 100 based on three dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Completeness | 50% | How many values are filled in. A column where half the rows are blank scores 50 on this dimension. |
| Uniqueness | 25% | How much variety exists in the data. A column where every row has a different value scores higher than one where most rows repeat the same handful of values. |
| Type Compliance | 25% | How many values are in the right format for their column type. A word in a number column, or a blank placeholder date, counts as a violation. |
Completeness carries the most weight because blank data is the most common and most damaging quality problem. A column where half the rows are empty cannot give useful answers in a report or an AI query, regardless of how well-formatted the values that do exist are.
Table Score
The table score is a weighted average of all its column scores. Not every column carries equal weight. ID columns, relationship keys, and date columns are structural anchors in most datasets. They are the columns that connect tables together and establish when things happened. If they contain blanks or broken values, everything that depends on them breaks too. Edilitics recognises these columns automatically and weights them 3 times heavier than standard columns:
- A column named exactly
id(for example, a primary key) - Any column whose name ends in
_id(for example,customer_id,order_id,product_id) - Any column whose name ends in
_at(for example,created_at,updated_at) - Any column whose name ends in
_date(for example,order_date,delivery_date)
System or internal columns whose names begin with _ are weighted at half the standard weight, since they rarely affect analytical output.
Integration Score
The integration score is the simple average of all table scores in the connected source. This is the number shown on the integration card.
Grade Scale
The score maps to a letter grade and a colour badge, matching exactly what you see in the app:
| Grade | Score | What It Signals |
|---|---|---|
| A | 90 – 100% | Data is well-filled, varied, and correctly formatted. Ready for AI analysis and reporting. |
| B | 75 – 89.9% | Strong data with minor gaps. Unlikely to affect most queries and reports. |
| C | 60 – 74.9% | Usable but noticeably incomplete. Summaries and totals may be understated due to missing values. |
| D | 45 – 59.9% | Many blanks or formatting problems. AI answers will be less reliable. Consider cleaning before analysis. |
| F | 0 – 44.9% | Data is mostly empty or structurally broken. AI output on this data will be unreliable. |
Before every AskEdi session and Auto Generate Charts run, an AI Advisory is shown. It reflects the current DQ and AIR grades for that table. The tone adjusts by grade: a positive confirmation for A or B, an informational note for C, and a clear warning for D or F. It never blocks either feature.
The Score Breakdown Drawer
Click the DQ score pill on any integration card to open the Score Breakdown Drawer. This shows the connection-level score broken into its three dimensions, each with its own progress bar, so you can see exactly which dimension is pulling the score down.
Completeness shows the percentage of non-null cells across all tables in the connection.
Uniqueness shows how much variety exists in the data across all tables in the connection.
Type Compliance shows the percentage of values that conform to their declared column type across the connection.
The drawer also shows the AI Readiness (AIR) Score breakdown directly below the DQ section. See the next section for how the two scores relate.
The footer of the drawer shows the direct next action based on your current state: Review Metadata if AI insights have been generated, or Generate Metadata if they have not. Both actions target the fastest path to improving your overall readiness.
How DQ Affects the AIR Score
The DQ Score is not just a health indicator. It is 50% of your AI Readiness (AIR) Score.
The AIR Score measures how well-prepared your data is for AI analysis. It combines two things equally: how clean and complete your data is (the DQ half), and how well your columns are described so the AI understands what each one means (the documentation half).
This creates a ceiling effect that matters in practice:
- An integration with a perfect DQ Score but no column descriptions reaches only Grade D on AIR. The AI has clean data but no context to interpret it.
- Once column descriptions are generated by AI, the score can reach Grade C. Better, but the AI is still working from unverified descriptions it wrote itself.
- Grade A requires human validation. When you review a column description and mark it as confirmed, you are telling the AI exactly what that column means in your business. That verified context is what separates a reliable AI session from a generic one.
Improving DQ is the highest-leverage first step because it lifts both scores at once. Once DQ is solid, the next gain comes from reviewing and confirming column descriptions in AI Column Insights.
Where DQ Scores Appear
| Location | What You See |
|---|---|
| Integration card | Connection-level DQ score pill. Click to open the Score Breakdown Drawer. |
| Score Breakdown Drawer | Connection-level completeness, uniqueness, and compliance progress bars. |
| View Tables | Per-table scores. Expand any table to see per-column scores alongside null count, distinct count, noncompliant count, and min/max values. |
| Transform pipeline | DQ score recalculated after every Save and Preview step. A delta badge on each operation card shows the immediate impact on data health. |
| Run History | Source and target DQ scores recorded at execution time for every pipeline run. |
When Profiling Runs
| Trigger | Detail |
|---|---|
| On connection creation | A background job fires immediately after a new integration is saved and profiles the entire source. |
| On integration edit | Any change to connection credentials or configuration triggers a full re-profile. |
| Manual refresh | Use Refresh DQ and AIR Scores in the integration hover menu. Rate-limited to once per 24 hours per integration. |
| Daily background job | Targets integrations that have not been profiled for 7 or more days and where the underlying data has grown by 5% or more since the last run. |
For detail on how manual refresh works and how schema drift is tracked between runs, see DQ Refresh and Schema Drift.
How to Improve Your Score
Your score is only useful if it tells you what to do next. The path forward is always the same: diagnose first, then fix.
Start in the Score Breakdown Drawer. Click the DQ score pill on your integration card. The three progress bars (Completeness, Uniqueness, and Type Compliance) tell you exactly which dimension is pulling your score down. That is where you focus first.
Then open View Tables. This shows you the score at table level and, within each table, at column level. You will quickly see whether the problem is concentrated in one or two columns or spread across the whole dataset. A single key column like customer_id or order_date with a high blank rate can drag the entire integration score down significantly due to its higher weight.
Once you know what is wrong, use Transform to fix it without touching your source data.
Fix blank values (low Completeness)
Blank values are the most common cause of a low score and have the biggest impact when fixed. In Transform, you have three options depending on the column:
- Fill Null Values: replace blanks with a sensible default. For a revenue column, this might be zero. For a category column, it might be "Unknown". For a date column, it might be a known start date.
- Filter Rows: if a row is missing a critical value like an order ID or a customer ID, it may not be worth keeping at all. Filter it out before writing to your destination.
- Drop or Rename Columns: if an entire column is mostly empty and not used in any report or AI session, removing it improves the score and reduces noise.
If blanks are appearing because the source system itself is not capturing the data, Transform can work around it but cannot fix the root cause. In that case, the most durable fix is upstream: ensure the source captures the value before it reaches Edilitics.
Fix formatting problems (low Type Compliance)
Type compliance drops when values are in the wrong format for their column: a word stored in a number column, a blank placeholder stored in a date column, or a zero stored where a date is expected. These often come from how data was originally entered or exported.
In Transform, use Cast Column Type to convert the column to the correct type. For example, if a date column was imported as text, casting it to a date type resolves the violation and improves the compliance score immediately.
Each Save and Preview step in Transform shows you the updated DQ score and a delta badge so you can see the exact improvement before committing.
Fix duplicate rows (low Uniqueness)
A low uniqueness score often means the same records are appearing multiple times, a common side effect of joins, exports, or syncs that run more than once. In Transform, use Drop Duplicate Rows to deduplicate the dataset before it reaches a destination or an AI session.
Note that some columns are intentionally repetitive. A status column with values like "active" and "inactive" will always have low uniqueness. A low uniqueness score on its own is less critical than low completeness or low compliance. Focus here only after the first two are addressed.
Improve your AIR Score (once DQ is solid)
Once your DQ score is at Grade B or above, your data is structurally ready. The next step is making it AI-ready.
Open View AI Insights from the integration hover menu. This shows you the AI-generated description for every column in your schema. Read each one. If it accurately describes what the column means in your business, mark it as Human Validated. If it is wrong or generic, edit it first, then validate it.
Every column you validate upgrades its contribution to the AIR Score and locks the description so it cannot be overwritten by future automated runs. The more columns you validate, the more reliably AskEdi can interpret your data and the more accurate your AI sessions become.
See AI Column Insights for the full process.
Frequently Asked Questions
Next Steps
AIR Score
How DQ combines with schema documentation to produce the AI Readiness Score, and what Grade A requires.
AI Column Insights
Generate and validate column descriptions to improve the semantic half of your AIR Score.
Transform
Fix blank values, type violations, and duplicates using no-code operations without touching your source data.
DQ Refresh and Schema Drift
How manual and scheduled profile refreshes work, and how schema drift is tracked between runs.
Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.
Last updated on