Edilitics | Data to Decisions

Window Functions

Apply rank, lag, lead, rolling and cumulative aggregations as new columns. Partition by any categorical column without writing code.

Window Functions add new columns computed via row-level calculations across the dataset or within partitioned groups. Unlike Group By, window functions preserve every original row - the new column appears alongside the existing data. You can configure multiple window definitions in one operation, each targeting any column.

Window functions add new columns. Original rows and columns are unchanged. Row count stays the same.


When to Use Window Functions

  • Rank within group. Rank revenue within each region to identify top performers per territory.
  • Lag for period-over-period comparison. Shift revenue by 1 row to compare each order against the previous one.
  • 7-day rolling average. Apply Rolling Mean with window size 7 on revenue to smooth daily volatility.
  • Running total. Cumulative Sum on revenue ordered by order_date to track year-to-date revenue.
  • Lead for next-row preview. Shift delivery_date forward by 1 to show the next scheduled delivery alongside each order.

Sample Dataset

The examples in this doc use the Edilitics sample orders dataset. Download it to follow along in your own workspace.

edilitics_sample_orders.csv

Sample orders dataset for hands-on practice · 500 rows

Download

Relevant columns for Window Functions examples:

Prop

Type


Supported Functions

11 window functions across 4 groups:

GroupFunctionLabelSource column typeExtra config
RankingrankRankAllRank method (optional)
OffsetlagLagNumericalOffset (rows to shift back)
OffsetleadLeadNumericalOffset (rows to shift forward)
Rollingrolling_meanRolling MeanNumericalWindow size
Rollingrolling_sumRolling SumNumericalWindow size
Rollingrolling_minRolling MinNumericalWindow size
Rollingrolling_maxRolling MaxNumericalWindow size
Rollingrolling_stdRolling Std DevNumericalWindow size
CumulativecumsumCumulative SumNumericalNone
CumulativecummaxCumulative MaxNumericalNone
CumulativecumminCumulative MinNumericalNone

Lag, Lead, Rolling, and Cumulative functions require a numerical source column. Rank accepts any column type.

Rank methods

MethodBehaviourExample (tied values at position 2)
AverageTied rows share the mean rank1, 2.5, 2.5, 4
MinTied rows all get the lowest rank1, 2, 2, 4
MaxTied rows all get the highest rank1, 3, 3, 4
DenseNo gaps between ranks1, 2, 2, 3
OrdinalEvery row gets a unique rank1, 2, 3, 4

How to Apply Window Functions

Open the Window Functions operation

In your Transform pipeline, click Add Operation and select Window Functions from the operation list.

One blank definition appears (Definition 1).

Select the Function

In the Function dropdown, choose one of the 11 functions. The Source Column dropdown updates to show only compatible column types: numerical-only for Lag, Lead, Rolling, and Cumulative functions; all columns for Rank.

Select the Source Column

Choose the column to compute the window over. Only columns compatible with the chosen function appear.

Configure function-specific parameters

  • Rolling functions (Rolling Mean, Rolling Sum, Rolling Min, Rolling Max, Rolling Std Dev): Enter a Window Size. Minimum 1. Example: 7 for a 7-row rolling window.
  • Lag / Lead: Enter an Offset. Minimum 1. Lag shifts values back (previous rows). Lead shifts values forward (next rows). Offset 1 means one row back or forward.
  • Rank: Optionally select a Rank Method. Default is Average.

Set Partition By (optional)

In the Partition By multi-select, choose one or more categorical columns. The window calculation resets independently for each unique combination of partition values.

Example: partition by region to rank revenue separately within each territory.

Leave empty to apply the function across the full dataset.

Name the output column

Type a name in the Output Column Name field. No auto-generated name - you must enter one.

The name must:

  • Contain only letters, numbers, and underscores
  • Not start with __ or a number
  • Not already exist in the dataset

Add more definitions (optional)

Click Add another window definition to add a second definition block. Each definition is independent - different function, source column, partition, and output name. All definitions are applied together when you save.

Click Save & Preview

Click Save & Preview. Edilitics applies all window definitions and adds the new columns. The success toast confirms: "Window functions applied. New columns added."

Verify in the preview

Check the new columns. Rolling functions produce null for the first N-1 rows (insufficient preceding rows for the window). Lag produces null for the first N rows. Lead produces null for the last N rows.


Before and After

Rolling Mean on revenue with window size 3, no partition. Result column: revenue_rolling_3.

Input:

order_idregionrevenue
ORD-2024-0001North America12450.00
ORD-2024-0047Europe8320.50
ORD-2024-0099North America83841.00
ORD-2024-0312Latin America3750.75
ORD-2024-0401Europe15200.00

Output:

order_idregionrevenuerevenue_rolling_3
ORD-2024-0001North America12450.00null
ORD-2024-0047Europe8320.50null
ORD-2024-0099North America83841.0034870.50
ORD-2024-0312Latin America3750.7531970.75
ORD-2024-0401Europe15200.0034264.00

Original columns unchanged. One new column added. First 2 rows null (window size 3 requires 3 rows).


Code Equivalent

-- Rank revenue within region (PostgreSQL / BigQuery / Snowflake / DuckDB)
SELECT
  *,
  RANK() OVER (PARTITION BY region ORDER BY revenue DESC) AS revenue_rank
FROM orders;

-- 7-row rolling mean on revenue
SELECT
  *,
  AVG(revenue) OVER (
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS revenue_rolling_7
FROM orders;

-- Lag by 1 row
SELECT
  *,
  LAG(revenue, 1) OVER (ORDER BY order_date) AS revenue_lag_1
FROM orders;
import polars as pl

# Rank within partition
df = df.with_columns(
    pl.col("revenue")
    .rank(method="average", descending=True)
    .over("region")
    .alias("revenue_rank")
)

# 7-row rolling mean
df = df.with_columns(
    pl.col("revenue")
    .rolling_mean(window_size=7)
    .alias("revenue_rolling_7")
)

# Lag by 1
df = df.with_columns(
    pl.col("revenue")
    .shift(1)
    .alias("revenue_lag_1")
)

# Cumulative sum
df = df.with_columns(
    pl.col("revenue")
    .cum_sum()
    .alias("revenue_cumsum")
)

After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


Operation Reference

Prop

Type


Frequently Asked Questions


Next Steps

Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.

Last updated on

On this page