Edilitics | Data to Decisions

Hybrid Pipelines

Combine no-code operations and Python Polars scripts in a single pipeline. Each step passes its output to the next regardless of type.

A hybrid pipeline mixes no-code operations and Code Editor steps in any order. The output of each step - regardless of type - becomes the input df for the next. No configuration is needed to switch between the two. Code steps handle logic that no single no-code operation can express. No-code steps handle everything else.

Both step types share the same pipeline: same preview, same run history, same scheduling.


When to Use a Hybrid Pipeline

Use a hybrid pipeline when at least one transformation in your sequence requires:

  • Parsing structured content out of a free-text or JSON column
  • Computing a derived score or metric using partitioned window logic plus cross-column arithmetic
  • Detecting state transitions across an ordered sequence of rows (e.g. first time a threshold is crossed)
  • Any multi-step calculation where intermediate columns are needed and then discarded

Everything else - filtering, joining, casting, cleaning nulls, grouping - stays as no-code.


Sample Datasets

The worked example below uses two sample files. Download both.

edilitics_sample_orders.csv

Primary orders dataset - 13 columns including metadata JSON and nested notes · 500 rows

Download

edilitics_sample_products.csv

Product catalog - join on product_id to add pricing and support tier · 4 rows

Download

Worked Example: Sales Rep Performance Pipeline

Goal: produce a per-rep, per-region summary table with revenue totals, a composite deal score, SLA breach counts, and milestone flags - from raw orders data.

10 steps. 3 code steps. 7 no-code steps.


Step 1 - Filter (no-code)

Remove rows where revenue is null. These are incomplete order records that would corrupt downstream aggregations.

Operation: Filter
Condition: revenue is not null
Result: dataset reduced from 500 to the rows with valid revenue values


Step 2 - Joins (no-code)

Left join edilitics_sample_products on product_id to add product_name, base_price_usd, and support_level to every order row. Orders with no matching product retain null for the added columns.

Operation: Joins
Join type: Left Join
Left Key: product_id (orders)
Right Key: product_id (products)
Added columns: product_name, base_price_usd, support_level, max_users, storage_gb
Result: same row count, 5 new columns from product catalog


Step 3 - Code Editor: Extract lead source and campaign from JSON metadata

The metadata column contains JSON strings like {"source": "web", "campaign": "q1_inbound"}. No no-code operation can parse structured fields out of a JSON string. This step extracts both fields into their own columns.

Operation: Code Editor

df = df.with_columns([
    pl.col("metadata").map_elements(
        lambda x: re.search(r'"source":\s*"([^"]+)"', x).group(1)
        if x and re.search(r'"source":\s*"([^"]+)"', x) else None,
        return_dtype=pl.String
    ).alias("lead_source"),
    pl.col("metadata").map_elements(
        lambda x: re.search(r'"campaign":\s*"([^"]+)"', x).group(1)
        if x and re.search(r'"campaign":\s*"([^"]+)"', x) else None,
        return_dtype=pl.String
    ).alias("campaign"),
])

Added columns: lead_source, campaign
Why code: row-level regex parsing of a semi-structured string column - not available in String Extract or any no-code operation


Step 4 - Datetime Delta (no-code)

Compute the number of days between order_date and delivery_date.

Operation: Datetime Delta
Start column: order_date
End column: delivery_date
Unit: Days
Output column: days_to_delivery
Result: new integer column with delivery duration per order


Step 5 - Conditional Column (no-code)

Flag each order's SLA status based on delivery duration.

Operation: Conditional Column
Rule: IF days_to_delivery > 30 THEN "Breached" ELSE "On Time"
Output column: sla_status
Result: new string column categorising every order


Step 6 - Code Editor: Composite deal score

Compute a weighted score per order combining revenue position within region, discount efficiency, and delivery penalty. This requires partitioned window aggregation (mean and std within region) combined with cross-column arithmetic in a single expression chain - impossible to reproduce with any combination of no-code operations.

Operation: Code Editor

df = df.with_columns([
    pl.col("revenue").mean().over("region").alias("region_rev_mean"),
    pl.col("revenue").std().over("region").alias("region_rev_std"),
])

df = df.with_columns(
    ((pl.col("revenue") - pl.col("region_rev_mean")) / pl.col("region_rev_std"))
    .alias("revenue_zscore")
)

df = df.with_columns(
    (
        pl.col("revenue_zscore") * 0.5
        + (1 - pl.col("discount_pct")) * 0.3
        + (1 / (pl.col("days_to_delivery") + 1)) * 0.2
    ).alias("deal_score")
)

df = df.drop(["region_rev_mean", "region_rev_std", "revenue_zscore"])

Added column: deal_score (float)
Dropped: 3 intermediate columns
Why code: partitioned z-score normalisation combined with multi-weight arithmetic across 3 columns - no no-code operation supports this


Step 7 - Drop / Rename Columns (no-code)

Clean up columns no longer needed downstream and rename for clarity.

Operation: Drop / Rename Columns
Drop: metadata, notes, email, max_users, storage_gb
Rename: base_price_usd to list_price
Result: leaner dataset with clear column names


Step 8 - Null Values Handling (no-code)

Some orders have a null product_tier (the column from the orders source, not the joined products table). Fill with a literal so downstream grouping doesn't drop these rows.

Operation: Null Values Handling
Column: product_tier
Strategy: Fill with literal value "Unknown"
Result: no null values in product_tier


Step 9 - Code Editor: Cumulative revenue milestone flag per sales rep

Sort orders by rep and date, compute the running revenue total per rep, then flag the single row where each rep's cumulative revenue first crosses 100,000. This requires ordered cumulative sum partitioned by rep and a state-transition detection across rows - not available in Window Functions or any other no-code operation.

Operation: Code Editor

df = df.sort(["sales_rep", "order_date"])

df = df.with_columns(
    pl.col("revenue").cum_sum().over("sales_rep").alias("cumulative_rev_by_rep")
)

df = df.with_columns(
    (
        (pl.col("cumulative_rev_by_rep") >= 100000)
        & (pl.col("cumulative_rev_by_rep").shift(1).over("sales_rep") < 100000)
    ).cast(pl.Int32).alias("crossed_100k")
)

Added columns: cumulative_rev_by_rep, crossed_100k (1 = milestone row, 0 otherwise)
Why code: detecting the first row where a running total crosses a threshold within a partition - no no-code operation supports state-transition detection


Step 10 - Group By (no-code)

Aggregate the enriched dataset by sales rep and region to produce the final summary table.

Operation: Group By
Group columns: sales_rep, region
Aggregations:

  • revenue - Sum → total_revenue
  • deal_score - Mean → avg_deal_score
  • order_id - Count → order_count
  • crossed_100k - Sum → milestone_orders
  • days_to_delivery - Mean → avg_delivery_days

Result: one row per rep-region combination with all performance metrics


Final Output Shape

The pipeline left panel shows a DQ delta badge on each completed step - green () if the table score improved vs the previous step, red () if it dropped. The table DQ score is recalculated after every Save & Preview on a 16,600-row sample.

StepOperationShape changeExpected DQ deltaWhy
1Filter (remove null revenue)~490 rows, 14 cols positiveNull rows removed - completeness rises across all columns
2Joins (left join products)same rows, +5 cols slight dropNew product columns may have nulls for unmatched rows
3Code: JSON extractionsame rows, +2 cols positivelead_source, campaign add clean string columns with high completeness
4Datetime Deltasame rows, +1 colneutral / slightdays_to_delivery is fully populated - high completeness, adds to weighted score
5Conditional Columnsame rows, +1 colneutral / slightsla_status fully populated from existing columns - no nulls introduced
6Code: deal scoresame rows, +1 col, -3 intermediateneutralNew float column; dropping intermediates removes low-signal columns
7Drop/Rename Columnssame rows, -5 cols positiveDropping sparse or metadata columns removes low-completeness weight from table score
8Null Values (fill product_tier)same rows positiveNull values replaced - completeness on product_tier rises to 100%
9Code: milestone flagsame rows, +2 colsneutralInteger columns fully populated - no nulls, adds completeness weight
10Group By~20 rows, 7 cols significantAggregation collapses to dense summary rows - high completeness, high distinctness on key columns

Key Principles

  • Code steps are not special. They sit in the pipeline like any other operation. Reorder, remove, or duplicate them freely.
  • Test before save. Every Code Editor step has a Test button that validates the script against the preview sample before the step is committed.
  • No-code first. Build as much as possible with no-code operations. Add code steps only where no-code cannot express the logic.
  • Drop intermediates in the same code step. If a code step creates columns only needed for a calculation, drop them before the step ends to keep the schema clean.

Frequently Asked Questions


Next Steps

Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.

Last updated on

On this page