Hybrid Pipelines
Combine no-code operations and Python Polars scripts in a single pipeline. Each step passes its output to the next regardless of type.
A hybrid pipeline mixes no-code operations and Code Editor steps in any order. The output of each step - regardless of type - becomes the input df for the next. No configuration is needed to switch between the two. Code steps handle logic that no single no-code operation can express. No-code steps handle everything else.
Both step types share the same pipeline: same preview, same run history, same scheduling.
When to Use a Hybrid Pipeline
Use a hybrid pipeline when at least one transformation in your sequence requires:
- Parsing structured content out of a free-text or JSON column
- Computing a derived score or metric using partitioned window logic plus cross-column arithmetic
- Detecting state transitions across an ordered sequence of rows (e.g. first time a threshold is crossed)
- Any multi-step calculation where intermediate columns are needed and then discarded
Everything else - filtering, joining, casting, cleaning nulls, grouping - stays as no-code.
Sample Datasets
The worked example below uses two sample files. Download both.
edilitics_sample_orders.csv
Primary orders dataset - 13 columns including metadata JSON and nested notes · 500 rows
edilitics_sample_products.csv
Product catalog - join on product_id to add pricing and support tier · 4 rows
Worked Example: Sales Rep Performance Pipeline
Goal: produce a per-rep, per-region summary table with revenue totals, a composite deal score, SLA breach counts, and milestone flags - from raw orders data.
10 steps. 3 code steps. 7 no-code steps.
Step 1 - Filter (no-code)
Remove rows where revenue is null. These are incomplete order records that would corrupt downstream aggregations.
Operation: Filter
Condition: revenue is not null
Result: dataset reduced from 500 to the rows with valid revenue values
Step 2 - Joins (no-code)
Left join edilitics_sample_products on product_id to add product_name, base_price_usd, and support_level to every order row. Orders with no matching product retain null for the added columns.
Operation: Joins
Join type: Left Join
Left Key: product_id (orders)
Right Key: product_id (products)
Added columns: product_name, base_price_usd, support_level, max_users, storage_gb
Result: same row count, 5 new columns from product catalog
Step 3 - Code Editor: Extract lead source and campaign from JSON metadata
The metadata column contains JSON strings like {"source": "web", "campaign": "q1_inbound"}. No no-code operation can parse structured fields out of a JSON string. This step extracts both fields into their own columns.
Operation: Code Editor
df = df.with_columns([
pl.col("metadata").map_elements(
lambda x: re.search(r'"source":\s*"([^"]+)"', x).group(1)
if x and re.search(r'"source":\s*"([^"]+)"', x) else None,
return_dtype=pl.String
).alias("lead_source"),
pl.col("metadata").map_elements(
lambda x: re.search(r'"campaign":\s*"([^"]+)"', x).group(1)
if x and re.search(r'"campaign":\s*"([^"]+)"', x) else None,
return_dtype=pl.String
).alias("campaign"),
])Added columns: lead_source, campaign
Why code: row-level regex parsing of a semi-structured string column - not available in String Extract or any no-code operation
Step 4 - Datetime Delta (no-code)
Compute the number of days between order_date and delivery_date.
Operation: Datetime Delta
Start column: order_date
End column: delivery_date
Unit: Days
Output column: days_to_delivery
Result: new integer column with delivery duration per order
Step 5 - Conditional Column (no-code)
Flag each order's SLA status based on delivery duration.
Operation: Conditional Column
Rule: IF days_to_delivery > 30 THEN "Breached" ELSE "On Time"
Output column: sla_status
Result: new string column categorising every order
Step 6 - Code Editor: Composite deal score
Compute a weighted score per order combining revenue position within region, discount efficiency, and delivery penalty. This requires partitioned window aggregation (mean and std within region) combined with cross-column arithmetic in a single expression chain - impossible to reproduce with any combination of no-code operations.
Operation: Code Editor
df = df.with_columns([
pl.col("revenue").mean().over("region").alias("region_rev_mean"),
pl.col("revenue").std().over("region").alias("region_rev_std"),
])
df = df.with_columns(
((pl.col("revenue") - pl.col("region_rev_mean")) / pl.col("region_rev_std"))
.alias("revenue_zscore")
)
df = df.with_columns(
(
pl.col("revenue_zscore") * 0.5
+ (1 - pl.col("discount_pct")) * 0.3
+ (1 / (pl.col("days_to_delivery") + 1)) * 0.2
).alias("deal_score")
)
df = df.drop(["region_rev_mean", "region_rev_std", "revenue_zscore"])Added column: deal_score (float)
Dropped: 3 intermediate columns
Why code: partitioned z-score normalisation combined with multi-weight arithmetic across 3 columns - no no-code operation supports this
Step 7 - Drop / Rename Columns (no-code)
Clean up columns no longer needed downstream and rename for clarity.
Operation: Drop / Rename Columns
Drop: metadata, notes, email, max_users, storage_gb
Rename: base_price_usd to list_price
Result: leaner dataset with clear column names
Step 8 - Null Values Handling (no-code)
Some orders have a null product_tier (the column from the orders source, not the joined products table). Fill with a literal so downstream grouping doesn't drop these rows.
Operation: Null Values Handling
Column: product_tier
Strategy: Fill with literal value "Unknown"
Result: no null values in product_tier
Step 9 - Code Editor: Cumulative revenue milestone flag per sales rep
Sort orders by rep and date, compute the running revenue total per rep, then flag the single row where each rep's cumulative revenue first crosses 100,000. This requires ordered cumulative sum partitioned by rep and a state-transition detection across rows - not available in Window Functions or any other no-code operation.
Operation: Code Editor
df = df.sort(["sales_rep", "order_date"])
df = df.with_columns(
pl.col("revenue").cum_sum().over("sales_rep").alias("cumulative_rev_by_rep")
)
df = df.with_columns(
(
(pl.col("cumulative_rev_by_rep") >= 100000)
& (pl.col("cumulative_rev_by_rep").shift(1).over("sales_rep") < 100000)
).cast(pl.Int32).alias("crossed_100k")
)Added columns: cumulative_rev_by_rep, crossed_100k (1 = milestone row, 0 otherwise)
Why code: detecting the first row where a running total crosses a threshold within a partition - no no-code operation supports state-transition detection
Step 10 - Group By (no-code)
Aggregate the enriched dataset by sales rep and region to produce the final summary table.
Operation: Group By
Group columns: sales_rep, region
Aggregations:
revenue- Sum →total_revenuedeal_score- Mean →avg_deal_scoreorder_id- Count →order_countcrossed_100k- Sum →milestone_ordersdays_to_delivery- Mean →avg_delivery_days
Result: one row per rep-region combination with all performance metrics
Final Output Shape
The pipeline left panel shows a DQ delta badge on each completed step - green (▲) if the table score improved vs the previous step, red (▼) if it dropped. The table DQ score is recalculated after every Save & Preview on a 16,600-row sample.
| Step | Operation | Shape change | Expected DQ delta | Why |
|---|---|---|---|---|
| 1 | Filter (remove null revenue) | ~490 rows, 14 cols | ▲ positive | Null rows removed - completeness rises across all columns |
| 2 | Joins (left join products) | same rows, +5 cols | ▼ slight drop | New product columns may have nulls for unmatched rows |
| 3 | Code: JSON extraction | same rows, +2 cols | ▲ positive | lead_source, campaign add clean string columns with high completeness |
| 4 | Datetime Delta | same rows, +1 col | neutral / ▲ slight | days_to_delivery is fully populated - high completeness, adds to weighted score |
| 5 | Conditional Column | same rows, +1 col | neutral / ▲ slight | sla_status fully populated from existing columns - no nulls introduced |
| 6 | Code: deal score | same rows, +1 col, -3 intermediate | neutral | New float column; dropping intermediates removes low-signal columns |
| 7 | Drop/Rename Columns | same rows, -5 cols | ▲ positive | Dropping sparse or metadata columns removes low-completeness weight from table score |
| 8 | Null Values (fill product_tier) | same rows | ▲ positive | Null values replaced - completeness on product_tier rises to 100% |
| 9 | Code: milestone flag | same rows, +2 cols | neutral | Integer columns fully populated - no nulls, adds completeness weight |
| 10 | Group By | ~20 rows, 7 cols | ▲ significant | Aggregation collapses to dense summary rows - high completeness, high distinctness on key columns |
Key Principles
- Code steps are not special. They sit in the pipeline like any other operation. Reorder, remove, or duplicate them freely.
- Test before save. Every Code Editor step has a Test button that validates the script against the preview sample before the step is committed.
- No-code first. Build as much as possible with no-code operations. Add code steps only where no-code cannot express the logic.
- Drop intermediates in the same code step. If a code step creates columns only needed for a calculation, drop them before the step ends to keep the schema clean.
Frequently Asked Questions
Next Steps
Code Editor
Full reference for writing Python Polars scripts inside a pipeline step.
Joins
Merge two tables on a key column. Cross-database supported.
Window Functions
Rank, lag, lead, and rolling aggregates without leaving no-code.
Group By
Aggregate rows by one or more columns with multiple aggregation functions.
Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.
Last updated on
Code Editor
Write Python scripts directly inside your Transform pipeline. Polars, NumPy, SciPy, math, and re preloaded. Test before saving. No imports required.
Scheduling
Schedule Edilitics Transform pipelines to run once, daily, or on custom cron expressions. Pause, resume, or edit schedules anytime with timezone-awareness.