Edilitics | Data to Decisions

Manage Nulls

Drop rows with missing values or impute nulls using mean, median, mode, min, max, or constant values. Apply different strategies per column in one operation.

Manage Nulls resolves missing values column by column. For each selected column, choose a strategy: drop rows where that column is null, or impute the null with a computed or fixed value. Each column gets its own strategy. You can configure multiple columns in a single operation.

The operation updates columns in place. No new column is created.


When to Use Manage Nulls

Use this operation when nulls in your dataset will break downstream logic or distort results.

  • Removing unrecoverable gaps. A row with a null order_id or customer_id cannot be joined or identified. Drop it.
  • Filling numeric gaps before aggregation. Nulls in SUM produce null output. Replace them with the column mean or median before running Group By.
  • Standardizing categorical columns. A null product_tier groups differently from "Unknown". Replace with a constant to make it explicit.
  • Preserving row count. Use statistical imputation (Mean, Median, Mode) instead of Drop when losing rows shrinks sample size below a usable threshold.
  • Filling with domain-known values. Some nulls have a known correct replacement from business rules: a null discount should be 0, a null country code should be "Unknown".

Sample Dataset

The examples in this doc use the Edilitics sample orders dataset. Download it to follow along in your own workspace.

edilitics_sample_orders.csv

Sample orders dataset for hands-on practice · 500 rows

Download

Relevant columns for Manage Nulls examples:

Prop

Type


Null Handling Strategies

Each column gets one strategy. Available strategies depend on the column's data type.

Numeric columns: Drop, Mean, Median, Mode, Min, Max, Std Dev, Variance, Constant

Categorical (string) columns: Drop, Mode, Constant

Drop removes every row where the selected column is null. All other rows are unaffected.

Use Drop when a null in this column makes the row meaningless: a missing primary key, a missing join field, or a missing required dimension.

Input revenueInput product_tierAction
780.62Growthkept
nullStarterdropped (revenue is null)
4703.29nullkept (only revenue is the drop key)

Drop is scoped to the column you select. A row is dropped only if that specific column is null. Other null columns in the same row are unaffected.

Available for: all column types.

Mean replaces each null with the arithmetic average of all non-null values in that column.

Non-null valuesMeanNull replaced with
780.62, 4703.29, 7347.744277.224277.22

Use Mean for normally distributed numeric fields where the average is a sensible central estimate. Avoid Mean when the column has significant outliers. Use Median instead.

Available for: numeric columns only.

Median replaces each null with the middle value of all non-null values in that column, sorted in order.

Non-null values (sorted)MedianNull replaced with
700.58, 780.62, 4703.29, 7347.742741.962741.96

Use Median when the column has outliers. The median is resistant to extreme values and gives a more representative fill for skewed distributions.

Available for: numeric columns only.

Mode replaces each null with the most frequently occurring value in that column.

Non-null valuesMost frequentNull replaced with
Growth, Starter, Starter, Growth, StarterStarterStarter

Use Mode when the most common value is a reasonable default. For example, filling a null product_tier with the most common tier in your dataset.

If two values tie in frequency, Edilitics uses the first value encountered in the sorted tie group.

Available for: numeric and categorical columns.

Min replaces each null with the smallest non-null value in the column. Max replaces with the largest.

Use Min when a missing value should conservatively default to the floor of the range. Use Max when a missing value should default to the ceiling. Examples: imputing a null discount_pct with Min (0) when no discount means no discount, or imputing a null score with Max as an optimistic default.

Available for: numeric columns only.

Std Dev replaces each null with the sample standard deviation of all non-null values in that column. Variance replaces with the sample variance.

These are specialist strategies for statistical pipelines where you need to fill a missing spread metric with the column's measured spread. Rarely used for general data cleaning.

Available for: numeric columns only.

Constant replaces each null with a fixed value you type in manually.

ColumnConstant enteredNull replaced with
product_tier (string)UnknownUnknown
discount_pct (float)00.0

Use Constant when the replacement value is known from business logic rather than statistical inference. Common examples: "Unknown", "N/A", 0, -1.

The constant must be compatible with the column's data type. Entering a string for a numeric column will cause an error on submit.

Available for: all column types.


How to Apply Manage Nulls

Open the Manage Nulls operation

In your Transform pipeline, click Add Operation and select Manage Nulls from the operation list.

Select a column

Choose a column from the dropdown. The selector shows all columns in your dataset.

Choose Drop or Impute

Select Drop Null values to remove rows, or select an imputation method. The imputation dropdown is disabled until you select a column.

Available imputation methods depend on the column type:

  • Numeric: Mean, Median, Mode, Min, Max, Std Dev, Variance, Constant
  • Categorical (string): Mode, Constant

If you select Constant, type the fill value directly into the field that appears.

Add more columns (optional)

Click Add Column to add another row for a different column. Each row is independent with its own strategy. The Add Column button is disabled until the current row has both a column and a strategy selected.

Remove a row (optional)

Click the remove icon on any row to delete it. The icon only appears when more than one row exists.

Submit the operation

Click Submit. Edilitics applies each strategy to its column in place. The preview updates with the imputed or reduced data.

Verify in the preview

Check that null counts dropped to zero for imputed columns, or that row count decreased for dropped columns. If results look wrong, remove the operation and reconfigure.


Before and After

Three strategies applied in one operation:

  • revenue: Mean (fill nulls with column average)
  • discount_pct: Constant 0
  • product_tier: Mode (fill nulls with most common tier)

Input:

revenuediscount_pctproduct_tier
780.620.25Growth
nullnullStarter
4703.290.15null
null0.25Starter

Column stats: revenue mean = 2741.96, discount_pct constant = 0, product_tier mode = Starter

Output:

revenuediscount_pctproduct_tier
780.620.25Growth
2741.960.0Starter
4703.290.15Starter
2741.960.0Starter

Code Equivalent

-- Mean imputation
SELECT
  COALESCE(revenue, AVG(revenue) OVER())  AS revenue,
  COALESCE(discount_pct, 0)               AS discount_pct,
  COALESCE(product_tier, 'Starter')       AS product_tier
FROM orders;

-- Drop rows where revenue is null
SELECT * FROM orders WHERE revenue IS NOT NULL;
import polars as pl

df = df.with_columns([
    # Mean imputation
    pl.col("revenue").fill_null(pl.col("revenue").mean()),
    # Constant fill
    pl.col("discount_pct").fill_null(0),
    # Mode fill
    pl.col("product_tier").fill_null(
        pl.col("product_tier").mode().first()
    ),
])

# Drop rows where revenue is null
df = df.drop_nulls(subset=["revenue"])

After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


Operation Reference

Prop

Type


Frequently Asked Questions


Next Steps

Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.

Last updated on

On this page