Manage Nulls
Drop rows with missing values or impute nulls using mean, median, mode, min, max, or constant values. Apply different strategies per column in one operation.
Manage Nulls resolves missing values column by column. For each selected column, choose a strategy: drop rows where that column is null, or impute the null with a computed or fixed value. Each column gets its own strategy. You can configure multiple columns in a single operation.
The operation updates columns in place. No new column is created.
When to Use Manage Nulls
Use this operation when nulls in your dataset will break downstream logic or distort results.
- Removing unrecoverable gaps. A row with a null
order_idorcustomer_idcannot be joined or identified. Drop it. - Filling numeric gaps before aggregation. Nulls in SUM produce null output. Replace them with the column mean or median before running Group By.
- Standardizing categorical columns. A null
product_tiergroups differently from"Unknown". Replace with a constant to make it explicit. - Preserving row count. Use statistical imputation (Mean, Median, Mode) instead of Drop when losing rows shrinks sample size below a usable threshold.
- Filling with domain-known values. Some nulls have a known correct replacement from business rules: a null discount should be
0, a null country code should be"Unknown".
Sample Dataset
The examples in this doc use the Edilitics sample orders dataset. Download it to follow along in your own workspace.
edilitics_sample_orders.csv
Sample orders dataset for hands-on practice · 500 rows
Relevant columns for Manage Nulls examples:
Prop
Type
Null Handling Strategies
Each column gets one strategy. Available strategies depend on the column's data type.
Numeric columns: Drop, Mean, Median, Mode, Min, Max, Std Dev, Variance, Constant
Categorical (string) columns: Drop, Mode, Constant
Drop removes every row where the selected column is null. All other rows are unaffected.
Use Drop when a null in this column makes the row meaningless: a missing primary key, a missing join field, or a missing required dimension.
Input revenue | Input product_tier | Action |
|---|---|---|
780.62 | Growth | kept |
null | Starter | dropped (revenue is null) |
4703.29 | null | kept (only revenue is the drop key) |
Drop is scoped to the column you select. A row is dropped only if that specific column is null. Other null columns in the same row are unaffected.
Available for: all column types.
Mean replaces each null with the arithmetic average of all non-null values in that column.
| Non-null values | Mean | Null replaced with |
|---|---|---|
780.62, 4703.29, 7347.74 | 4277.22 | 4277.22 |
Use Mean for normally distributed numeric fields where the average is a sensible central estimate. Avoid Mean when the column has significant outliers. Use Median instead.
Available for: numeric columns only.
Median replaces each null with the middle value of all non-null values in that column, sorted in order.
| Non-null values (sorted) | Median | Null replaced with |
|---|---|---|
700.58, 780.62, 4703.29, 7347.74 | 2741.96 | 2741.96 |
Use Median when the column has outliers. The median is resistant to extreme values and gives a more representative fill for skewed distributions.
Available for: numeric columns only.
Mode replaces each null with the most frequently occurring value in that column.
| Non-null values | Most frequent | Null replaced with |
|---|---|---|
Growth, Starter, Starter, Growth, Starter | Starter | Starter |
Use Mode when the most common value is a reasonable default. For example, filling a null product_tier with the most common tier in your dataset.
If two values tie in frequency, Edilitics uses the first value encountered in the sorted tie group.
Available for: numeric and categorical columns.
Min replaces each null with the smallest non-null value in the column. Max replaces with the largest.
Use Min when a missing value should conservatively default to the floor of the range. Use Max when a missing value should default to the ceiling. Examples: imputing a null discount_pct with Min (0) when no discount means no discount, or imputing a null score with Max as an optimistic default.
Available for: numeric columns only.
Std Dev replaces each null with the sample standard deviation of all non-null values in that column. Variance replaces with the sample variance.
These are specialist strategies for statistical pipelines where you need to fill a missing spread metric with the column's measured spread. Rarely used for general data cleaning.
Available for: numeric columns only.
Constant replaces each null with a fixed value you type in manually.
| Column | Constant entered | Null replaced with |
|---|---|---|
product_tier (string) | Unknown | Unknown |
discount_pct (float) | 0 | 0.0 |
Use Constant when the replacement value is known from business logic rather than statistical inference. Common examples: "Unknown", "N/A", 0, -1.
The constant must be compatible with the column's data type. Entering a string for a numeric column will cause an error on submit.
Available for: all column types.
How to Apply Manage Nulls
Open the Manage Nulls operation
In your Transform pipeline, click Add Operation and select Manage Nulls from the operation list.
Select a column
Choose a column from the dropdown. The selector shows all columns in your dataset.
Choose Drop or Impute
Select Drop Null values to remove rows, or select an imputation method. The imputation dropdown is disabled until you select a column.
Available imputation methods depend on the column type:
- Numeric: Mean, Median, Mode, Min, Max, Std Dev, Variance, Constant
- Categorical (string): Mode, Constant
If you select Constant, type the fill value directly into the field that appears.
Add more columns (optional)
Click Add Column to add another row for a different column. Each row is independent with its own strategy. The Add Column button is disabled until the current row has both a column and a strategy selected.
Remove a row (optional)
Click the remove icon on any row to delete it. The icon only appears when more than one row exists.
Submit the operation
Click Submit. Edilitics applies each strategy to its column in place. The preview updates with the imputed or reduced data.
Verify in the preview
Check that null counts dropped to zero for imputed columns, or that row count decreased for dropped columns. If results look wrong, remove the operation and reconfigure.
Before and After
Three strategies applied in one operation:
revenue: Mean (fill nulls with column average)discount_pct: Constant0product_tier: Mode (fill nulls with most common tier)
Input:
| revenue | discount_pct | product_tier |
|---|---|---|
780.62 | 0.25 | Growth |
null | null | Starter |
4703.29 | 0.15 | null |
null | 0.25 | Starter |
Column stats: revenue mean = 2741.96, discount_pct constant = 0, product_tier mode = Starter
Output:
| revenue | discount_pct | product_tier |
|---|---|---|
780.62 | 0.25 | Growth |
2741.96 | 0.0 | Starter |
4703.29 | 0.15 | Starter |
2741.96 | 0.0 | Starter |
Code Equivalent
-- Mean imputation
SELECT
COALESCE(revenue, AVG(revenue) OVER()) AS revenue,
COALESCE(discount_pct, 0) AS discount_pct,
COALESCE(product_tier, 'Starter') AS product_tier
FROM orders;
-- Drop rows where revenue is null
SELECT * FROM orders WHERE revenue IS NOT NULL;import polars as pl
df = df.with_columns([
# Mean imputation
pl.col("revenue").fill_null(pl.col("revenue").mean()),
# Constant fill
pl.col("discount_pct").fill_null(0),
# Mode fill
pl.col("product_tier").fill_null(
pl.col("product_tier").mode().first()
),
])
# Drop rows where revenue is null
df = df.drop_nulls(subset=["revenue"])After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.
After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.
Operation Reference
Prop
Type
Frequently Asked Questions
Next Steps
Cast Data Types
Convert column types after resolving nulls: numeric strings to integers, objects to datetimes.
Filter
Filter rows by value after nulls are resolved or removed.
Group By
Aggregate columns now that nulls won't distort your sums and averages.
Find & Replace
Replace empty strings stored as values instead of true nulls.
Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.
Last updated on
Split Columns
Split a column into multiple new string columns by delimiter or regex. Define output column names and configure multiple split rules in one operation.
Find & Replace
Search for string or regex patterns inside any column and replace matches with new values. Run multiple find-and-replace rules across columns in one step.