Hybrid Pipelines
Combine no-code operations and Python Polars scripts in a single pipeline. Each step passes its output to the next regardless of type.
A hybrid pipeline lets you mix point-and-click operations with Python scripts in a single workflow. Each step feeds into the next automatically. Use the no-code operations for standard tasks like filtering, joining, and cleaning. Add a script step when the logic is too custom for a point-and-click operation.
Everything runs in the same pipeline - same preview, same run history, same schedule.
When to Use a Hybrid Pipeline
Add a script step to your pipeline when you need to:
- Read values out of a JSON or free-text column
- Calculate a score that combines several columns with custom weights
- Find the first row where a running total crosses a threshold
- Group by more than one column at the same time
Everything else - stacking tables, filtering rows, joining datasets, cleaning nulls, renaming columns - is covered by the point-and-click operations.
Sample Datasets
This tutorial uses three sample files. Download all three before you begin.
edilitics_sample_orders.csv
H1 orders - 500 rows, 14 columns including metadata JSON · 500 rows
edilitics_sample_orders_h2.csv
H2 orders - 250 rows, 16 columns (includes return_flag and delivery_partner) · 250 rows
edilitics_sample_products.csv
Product catalog - 4 rows, join on product_id to add pricing and support tier · 4 rows
What You Start With
Three separate raw files with several data quality problems to fix before analysis:
| File | Rows | Issues |
|---|---|---|
edilitics_sample_orders.csv | 500 | H1 orders. Nulls in discount_pct, product_tier, and notes. Contains 10 orders referencing a product that does not exist in the catalog. |
edilitics_sample_orders_h2.csv | 250 | H2 orders. Same core columns as H1, plus two extra columns (return_flag, delivery_partner) that H1 does not have. |
edilitics_sample_products.csv | 4 | Product catalog. No issues. |
What You End With
A single clean summary table with one row per sales rep per region - ready for reporting or dashboard use.
| Column | Description |
|---|---|
sales_rep | Sales representative name |
region | Region |
total_revenue | Total revenue across all valid orders |
order_count | Number of orders |
avg_deal_score | Average deal quality score |
sla_breach_count | Number of orders that took more than 30 days to deliver |
avg_delivery_days | Average days from order to delivery |
milestone_orders | Number of orders where the rep first crossed 100k in cumulative revenue |
No nulls. No raw JSON. No unmatched products. No schema mismatches. Around 40 rows.
Worked Example: Full-Year Sales Performance Pipeline
Goal: combine two order exports and a product catalog into a per-rep, per-region performance summary - cleaning data quality issues, enriching with product details, scoring deal quality, and flagging SLA breaches.
10 steps. 4 code steps. 6 no-code steps.
Start your pipeline with edilitics_sample_orders.csv as the source table.
Concat (no-code)
Stack the H2 order export onto the H1 source data to build a full-year dataset.
Operation: Concat
Concat type: Vertical
Database: select the integration where you uploaded the sample files
Table: edilitics_sample_orders_h2
Manage duplicates: Keep All
When you select the H2 table, Edilitics checks whether the two tables share the same columns. Because H2 has two extra columns (return_flag and delivery_partner) that H1 does not, the status will show Concatenation not allowed and open a conflict modal listing those columns under Extra Columns. Click Drop Columns to continue. This removes the extra columns from the H2 rows so both tables match on the same 14-column schema before stacking.
Result: 750 rows, 14 columns
Filter (no-code)
Remove orders that reference a product not in the catalog. The H1 data contains 10 orders with product_id set to PROD-999, which does not exist in the products table. Keeping these rows would result in empty product columns after the join in Step 4, making them useless for analysis. Remove them now.
Operation: Filter
Filter Column: product_id
Condition: Not Equal to
Target Value(s): PROD-999
Result: 740 rows - 10 orders removed
Null Values Handling (no-code)
Drop rows where product_tier is null. Orders without a product tier cannot be grouped or segmented accurately downstream.
Operation: Null Values Handling
Column: product_tier
Impute/Drop: Drop Null values
Result: 593 rows - 147 incomplete records removed
Joins (no-code)
Enrich each order with product details from the catalog. After Step 2, every remaining order has a valid product_id, so the join will fully populate the new columns with no unmatched rows.
Operation: Joins
Join type: Left Join
Left Key: product_id (orders)
Right Key: product_id (products)
Added columns: product_name, base_price_usd, support_level, max_users, storage_gb
Result: 593 rows, 5 new columns added
Code Editor: Extract lead source and campaign from metadata
The metadata column stores a small JSON object on every row, for example {"source": "web", "campaign": "q1_inbound"}. The point-and-click operations cannot read values out of a JSON string. This script pulls source and campaign into their own plain-text columns so they can be used later in grouping and reporting.
Operation: Code Editor
df = df.with_columns([
pl.col("metadata").map_elements(
lambda x: re.search(r'"source":\s*"([^"]+)"', x).group(1)
if x and re.search(r'"source":\s*"([^"]+)"', x) else None,
return_dtype=pl.String
).alias("lead_source"),
pl.col("metadata").map_elements(
lambda x: re.search(r'"campaign":\s*"([^"]+)"', x).group(1)
if x and re.search(r'"campaign":\s*"([^"]+)"', x) else None,
return_dtype=pl.String
).alias("campaign"),
])Added columns: lead_source, campaign
Why code: extracting values from a JSON string is not available as a point-and-click operation
Date Time Delta (no-code)
Calculate how many days passed between the order date and the delivery date for each order.
Operation: Date Time Delta
Start column: order_date
End column: delivery_date
Unit: Days
Result column name: days_to_delivery
Result: new column added with the delivery duration for every order
Conditional Column (no-code)
Flag whether each order met or missed the 30-day SLA. Any order that took more than 30 days to deliver is marked as Breached.
Operation: Conditional Column
New column name: sla_status
Rule: IF days_to_delivery > 30 THEN Breached
Else value: On Time
Result: 268 orders marked Breached, 325 marked On Time
Code Editor: Deal score
Give each order a score based on three things: how strong its revenue was relative to other orders in the same region, how lean the discount was, and how fast it was delivered. The final score blends all three using fixed weights.
To build this score, the script first calculates the average and spread of revenue within each region, uses those to position each order, then combines the result with the discount and delivery figures. The working values are dropped at the end - only deal_score remains.
Operation: Code Editor
df = df.with_columns([
pl.col("revenue").mean().over("region").alias("region_rev_mean"),
pl.col("revenue").std().over("region").alias("region_rev_std"),
])
df = df.with_columns(
((pl.col("revenue") - pl.col("region_rev_mean")) / pl.col("region_rev_std"))
.alias("revenue_zscore")
)
df = df.with_columns(
(
pl.col("revenue_zscore") * 0.5
+ (1 - pl.col("discount_pct").fill_null(0.0)) * 0.3
+ (1 / (pl.col("days_to_delivery") + 1)) * 0.2
).alias("deal_score")
)
df = df.drop(["region_rev_mean", "region_rev_std", "revenue_zscore"])Added column: deal_score
Removed: 3 working columns used only for the calculation
Why code: this score needs several intermediate values that are calculated and used in sequence - not possible with a single point-and-click operation
Drop / Rename Columns (no-code)
Remove columns that are no longer needed and rename one for clarity before the final aggregation.
Operation: Drop / Rename Columns
In the column list, click the delete icon next to each of the following columns to remove them:
metadatanotesemailmax_usersstorage_gb
Then rename base_price_usd to list_price by editing the name field inline.
Click Save & Preview when done.
Result: a leaner dataset with no raw JSON, no personal data, and clear column names
Code Editor: Milestone flag and final aggregation
This step produces the final summary table. It does two things that the point-and-click operations cannot handle:
- Milestone flag - sorts each rep's orders by date, builds a running revenue total, and marks the specific order where each rep first crossed 100,000. This requires stepping through rows in order, which is not available as a point-and-click operation.
- Group by two columns - summarises the data by both
sales_repandregionat the same time. The Group By operation supports one group column only.
Operation: Code Editor
df = df.sort(["sales_rep", "order_date"])
df = df.with_columns(
pl.col("revenue").cum_sum().over("sales_rep").alias("cumulative_rev_by_rep")
)
df = df.with_columns(
(
(pl.col("cumulative_rev_by_rep") >= 100000)
& (pl.col("cumulative_rev_by_rep").shift(1).over("sales_rep") < 100000)
).cast(pl.Int32).alias("milestone_row")
)
df = df.with_columns(
(pl.col("sla_status") == pl.lit("Breached")).cast(pl.Int32).alias("sla_breach")
)
df = df.group_by(["sales_rep", "region"]).agg([
pl.col("revenue").sum().alias("total_revenue"),
pl.col("deal_score").mean().alias("avg_deal_score"),
pl.col("order_id").count().alias("order_count"),
pl.col("sla_breach").sum().alias("sla_breach_count"),
pl.col("days_to_delivery").mean().alias("avg_delivery_days"),
pl.col("milestone_row").sum().alias("milestone_orders"),
]).sort(["sales_rep", "region"])Result: approximately 40 rows, 8 columns - one row per sales rep per region, ready for reporting
Pipeline Summary
| Step | Operation | Rows | Columns | What changed |
|---|---|---|---|---|
| Start | Source load | 500 | 14 | - |
| 1 | Concat | 750 | 14 | H2 rows added, extra H2 columns dropped |
| 2 | Filter | 740 | 14 | 10 unmatched product orders removed |
| 3 | Null Values Handling | 593 | 14 | 147 rows with no product tier removed |
| 4 | Joins | 593 | 19 | 5 product columns added |
| 5 | Code: metadata extraction | 593 | 21 | lead_source and campaign added |
| 6 | Date Time Delta | 593 | 22 | days_to_delivery added |
| 7 | Conditional Column | 593 | 23 | sla_status added |
| 8 | Code: deal score | 593 | 21 | deal_score added, 3 working columns removed |
| 9 | Drop / Rename Columns | 593 | 16 | 5 columns removed, 1 renamed |
| 10 | Code: milestone + group by | ~40 | 8 | Aggregated to final summary table |
Key Principles
- Point-and-click first. Use the no-code operations for everything they can handle. Add a script step only when a point-and-click operation cannot do the job.
- Test before saving. Every script step has a Test button. Click it to check the script against the preview data before saving.
- Script steps are not special. They sit in the pipeline like any other step. You can reorder, remove, or duplicate them freely.
- Clean up in the same step. If a script creates working columns that are only needed mid-calculation, remove them before the step ends so they do not carry forward.
Frequently Asked Questions
Next Steps
Code Editor
Full reference for writing Python Polars scripts inside a pipeline step.
Concat
Stack tables vertically, horizontally, or diagonally with built-in schema validation.
Joins
Merge two tables on a key column. Cross-database supported.
Null Values Handling
Drop or impute null values column by column.
Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.
Last updated on
Code Editor
Write Python scripts directly inside your Transform pipeline. Polars, NumPy, SciPy, math, and re preloaded. Test before saving. No imports required.
Scheduling
Schedule Edilitics Transform pipelines to run once, daily, or on custom cron expressions. Pause, resume, or edit schedules anytime with timezone-awareness.