Is there a limit on how many code steps I can add?

No. A pipeline can contain any number of Code Editor steps.

Can a code step use columns added by a previous no-code step?

Yes. Every step receives the full output of the previous step as df, regardless of step type.

Do hybrid pipelines support scheduling?

Yes. Hybrid pipelines are scheduled and executed identically to no-code-only pipelines.

Hybrid Pipelines | Edilitics Docs

Combine no-code operations and Python Polars scripts in a single pipeline. Each step passes its output to the next regardless of type.

A hybrid pipeline lets you mix point-and-click operations with Python scripts in a single workflow. Each step feeds into the next automatically. Use the no-code operations for standard tasks like filtering, joining, and cleaning. Add a script step when the logic is too custom for a point-and-click operation.

Everything runs in the same pipeline - same preview, same run history, same schedule.

When to Use a Hybrid Pipeline

Add a script step to your pipeline when you need to:

Read values out of a JSON or free-text column
Calculate a score that combines several columns with custom weights
Find the first row where a running total crosses a threshold
Group by more than one column at the same time

Everything else - stacking tables, filtering rows, joining datasets, cleaning nulls, renaming columns - is covered by the point-and-click operations.

Sample Datasets

This tutorial uses three sample files. Download all three before you begin.

edilitics_sample_orders.csv

H1 orders - 500 rows, 14 columns including metadata JSON · 500 rows

Download

edilitics_sample_orders_h2.csv

H2 orders - 250 rows, 16 columns (includes return_flag and delivery_partner) · 250 rows

Download

edilitics_sample_products.csv

Product catalog - 4 rows, join on product_id to add pricing and support tier · 4 rows

Download

What You Start With

Three separate raw files with several data quality problems to fix before analysis:

File	Rows	Issues
`edilitics_sample_orders.csv`	500	H1 orders. Nulls in `discount_pct`, `product_tier`, and `notes`. Contains 10 orders referencing a product that does not exist in the catalog.
`edilitics_sample_orders_h2.csv`	250	H2 orders. Same core columns as H1, plus two extra columns (`return_flag`, `delivery_partner`) that H1 does not have.
`edilitics_sample_products.csv`	4	Product catalog. No issues.

What You End With

A single clean summary table with one row per sales rep per region - ready for reporting or dashboard use.

Column	Description
`sales_rep`	Sales representative name
`region`	Region
`total_revenue`	Total revenue across all valid orders
`order_count`	Number of orders
`avg_deal_score`	Average deal quality score
`sla_breach_count`	Number of orders that took more than 30 days to deliver
`avg_delivery_days`	Average days from order to delivery
`milestone_orders`	Number of orders where the rep first crossed 100k in cumulative revenue

No nulls. No raw JSON. No unmatched products. No schema mismatches. Around 40 rows.

Worked Example: Full-Year Sales Performance Pipeline

Goal: combine two order exports and a product catalog into a per-rep, per-region performance summary - cleaning data quality issues, enriching with product details, scoring deal quality, and flagging SLA breaches.

10 steps. 4 code steps. 6 no-code steps.

Start your pipeline with edilitics_sample_orders.csv as the source table.

Concat (no-code)

Stack the H2 order export onto the H1 source data to build a full-year dataset.

Operation: Concat

Concat type: Vertical

Database: select the integration where you uploaded the sample files

Table: edilitics_sample_orders_h2

Manage duplicates: Keep All

When you select the H2 table, Edilitics checks whether the two tables share the same columns. Because H2 has two extra columns (return_flag and delivery_partner) that H1 does not, the status will show Concatenation not allowed and open a conflict modal listing those columns under Extra Columns. Click Drop Columns to continue. This removes the extra columns from the H2 rows so both tables match on the same 14-column schema before stacking.

Result: 750 rows, 14 columns

Filter (no-code)

Remove orders that reference a product not in the catalog. The H1 data contains 10 orders with product_id set to PROD-999, which does not exist in the products table. Keeping these rows would result in empty product columns after the join in Step 4, making them useless for analysis. Remove them now.

Operation: Filter

Filter Column: product_id

Condition: Not Equal to

Target Value(s): PROD-999

Result: 740 rows - 10 orders removed

Null Values Handling (no-code)

Drop rows where product_tier is null. Orders without a product tier cannot be grouped or segmented accurately downstream.

Operation: Null Values Handling

Column: product_tier

Impute/Drop: Drop Null values

Result: 593 rows - 147 incomplete records removed

Joins (no-code)

Enrich each order with product details from the catalog. After Step 2, every remaining order has a valid product_id, so the join will fully populate the new columns with no unmatched rows.

Operation: Joins

Join type: Left Join

Left Key: product_id (orders)

Right Key: product_id (products)

Added columns: product_name, base_price_usd, support_level, max_users, storage_gb

Result: 593 rows, 5 new columns added

Code Editor: Extract lead source and campaign from metadata

The metadata column stores a small JSON object on every row, for example {"source": "web", "campaign": "q1_inbound"}. The point-and-click operations cannot read values out of a JSON string. This script pulls source and campaign into their own plain-text columns so they can be used later in grouping and reporting.

Operation: Code Editor

df = df.with_columns([
    pl.col("metadata").map_elements(
        lambda x: re.search(r'"source":\s*"([^"]+)"', x).group(1)
        if x and re.search(r'"source":\s*"([^"]+)"', x) else None,
        return_dtype=pl.String
    ).alias("lead_source"),
    pl.col("metadata").map_elements(
        lambda x: re.search(r'"campaign":\s*"([^"]+)"', x).group(1)
        if x and re.search(r'"campaign":\s*"([^"]+)"', x) else None,
        return_dtype=pl.String
    ).alias("campaign"),
])

Added columns: lead_source, campaign

Why code: extracting values from a JSON string is not available as a point-and-click operation

Date Time Delta (no-code)

Calculate how many days passed between the order date and the delivery date for each order.

Operation: Date Time Delta

Start column: order_date

End column: delivery_date

Unit: Days

Result column name: days_to_delivery

Result: new column added with the delivery duration for every order

Conditional Column (no-code)

Flag whether each order met or missed the 30-day SLA. Any order that took more than 30 days to deliver is marked as Breached.

Operation: Conditional Column

New column name: sla_status

Rule: IF days_to_delivery > 30 THEN Breached

Else value: On Time

Result: 268 orders marked Breached, 325 marked On Time

Code Editor: Deal score

Give each order a score based on three things: how strong its revenue was relative to other orders in the same region, how lean the discount was, and how fast it was delivered. The final score blends all three using fixed weights.

To build this score, the script first calculates the average and spread of revenue within each region, uses those to position each order, then combines the result with the discount and delivery figures. The working values are dropped at the end - only deal_score remains.

Operation: Code Editor

df = df.with_columns([
    pl.col("revenue").mean().over("region").alias("region_rev_mean"),
    pl.col("revenue").std().over("region").alias("region_rev_std"),
])

df = df.with_columns(
    ((pl.col("revenue") - pl.col("region_rev_mean")) / pl.col("region_rev_std"))
    .alias("revenue_zscore")
)

df = df.with_columns(
    (
        pl.col("revenue_zscore") * 0.5
        + (1 - pl.col("discount_pct").fill_null(0.0)) * 0.3
        + (1 / (pl.col("days_to_delivery") + 1)) * 0.2
    ).alias("deal_score")
)

df = df.drop(["region_rev_mean", "region_rev_std", "revenue_zscore"])

Added column: deal_score

Removed: 3 working columns used only for the calculation

Why code: this score needs several intermediate values that are calculated and used in sequence - not possible with a single point-and-click operation

Drop / Rename Columns (no-code)

Remove columns that are no longer needed and rename one for clarity before the final aggregation.

Operation: Drop / Rename Columns

In the column list, click the delete icon next to each of the following columns to remove them:

metadata
notes
email
max_users
storage_gb

Then rename base_price_usd to list_price by editing the name field inline.

Click Save & Preview when done.

Result: a leaner dataset with no raw JSON, no personal data, and clear column names

Code Editor: Milestone flag and final aggregation

This step produces the final summary table. It does two things that the point-and-click operations cannot handle:

Milestone flag - sorts each rep's orders by date, builds a running revenue total, and marks the specific order where each rep first crossed 100,000. This requires stepping through rows in order, which is not available as a point-and-click operation.
Group by two columns - summarises the data by both sales_rep and region at the same time. The Group By operation supports one group column only.

Operation: Code Editor

df = df.sort(["sales_rep", "order_date"])

df = df.with_columns(
    pl.col("revenue").cum_sum().over("sales_rep").alias("cumulative_rev_by_rep")
)

df = df.with_columns(
    (
        (pl.col("cumulative_rev_by_rep") >= 100000)
        & (pl.col("cumulative_rev_by_rep").shift(1).over("sales_rep") < 100000)
    ).cast(pl.Int32).alias("milestone_row")
)

df = df.with_columns(
    (pl.col("sla_status") == pl.lit("Breached")).cast(pl.Int32).alias("sla_breach")
)

df = df.group_by(["sales_rep", "region"]).agg([
    pl.col("revenue").sum().alias("total_revenue"),
    pl.col("deal_score").mean().alias("avg_deal_score"),
    pl.col("order_id").count().alias("order_count"),
    pl.col("sla_breach").sum().alias("sla_breach_count"),
    pl.col("days_to_delivery").mean().alias("avg_delivery_days"),
    pl.col("milestone_row").sum().alias("milestone_orders"),
]).sort(["sales_rep", "region"])

Result: approximately 40 rows, 8 columns - one row per sales rep per region, ready for reporting

Pipeline Summary

Step	Operation	Rows	Columns	What changed
Start	Source load	500	14	-
1	Concat	750	14	H2 rows added, extra H2 columns dropped
2	Filter	740	14	10 unmatched product orders removed
3	Null Values Handling	593	14	147 rows with no product tier removed
4	Joins	593	19	5 product columns added
5	Code: metadata extraction	593	21	`lead_source` and `campaign` added
6	Date Time Delta	593	22	`days_to_delivery` added
7	Conditional Column	593	23	`sla_status` added
8	Code: deal score	593	21	`deal_score` added, 3 working columns removed
9	Drop / Rename Columns	593	16	5 columns removed, 1 renamed
10	Code: milestone + group by	~40	8	Aggregated to final summary table

Key Principles

Point-and-click first. Use the no-code operations for everything they can handle. Add a script step only when a point-and-click operation cannot do the job.
Test before saving. Every script step has a Test button. Click it to check the script against the preview data before saving.
Script steps are not special. They sit in the pipeline like any other step. You can reorder, remove, or duplicate them freely.
Clean up in the same step. If a script creates working columns that are only needed mid-calculation, remove them before the step ends so they do not carry forward.

Hybrid Pipelines

When to Use a Hybrid Pipeline

Sample Datasets

What You Start With

What You End With

Worked Example: Full-Year Sales Performance Pipeline

Concat (no-code)

Filter (no-code)

Null Values Handling (no-code)

Joins (no-code)

Code Editor: Extract lead source and campaign from metadata

Date Time Delta (no-code)

Conditional Column (no-code)

Code Editor: Deal score

Drop / Rename Columns (no-code)

Code Editor: Milestone flag and final aggregation

Pipeline Summary

Key Principles

Frequently Asked Questions

Next Steps

Code Editor

Concat

Joins

Null Values Handling

On this page

Hybrid Pipelines

Why does the Concat step show Concatenation not allowed?

Can a script step use columns added by a previous step?

What happens if a script step fails?

Can I reorder steps after building the pipeline?

Do hybrid pipelines support scheduling?

Is there a limit on how many script steps I can add?

Code Editor

Concat

Joins

Null Values Handling

On this page