Edilitics | Data to Decisions

Code Editor

Write Python scripts directly inside your Transform pipeline. Polars, NumPy, SciPy, math, and re preloaded. Test before saving. No imports required.

Code Editor (Advanced Mode) lets you write Python scripts directly inside your Transform pipeline. The dataframe is available as df. Polars, NumPy, SciPy, math, and re are preloaded - no imports required. Write your transformation below the read-only preamble, test it, then save.

Code Editor operations can be placed anywhere in a pipeline alongside no-code operations. The output of each step becomes the input for the next, regardless of whether the step is a code or no-code operation.


When to Use Code Editor

  • Multi-condition logic. Classify rows using pl.when().then().otherwise() chains that no-code Filter cannot express.
  • Custom column derivation. Create columns from arithmetic, string manipulation, or pattern matching across multiple existing columns.
  • Regex extraction. Extract substrings, domains, or codes from text columns using re directly on string series.
  • Numerical operations. Apply NumPy or SciPy functions to columns for statistical scoring, normalization, or scientific computation.
  • Edge cases beyond no-code. Any transformation that requires logic branching, cross-column calculation, or library functions not available in the no-code operations.

Sample Dataset

The examples in this doc use the Edilitics sample orders dataset. Download it to follow along in your own workspace.

edilitics_sample_orders.csv

Sample orders dataset for hands-on practice · 500 rows

Download

Editor Setup

When the Code Editor opens, the top of the editor contains a read-only preamble block. This block cannot be edited:

import re
import math
import scipy
import numpy as np
import polars as pl

# Transform your dataframe using Polars syntax.
# The dataframe object is available as: df
# Example: df = df.with_columns([df["column_name"].cast(pl.Int64)])

# Notes:
# - No imports are required or supported - Polars is preloaded.
# - Only whitelisted packages/functions are allowed.
# - Code runs on a 16,600-row preview sample only.

# Begin your script below:
#dataframe object = df

Write your transformation code below this block. The preamble handles all library loading. Reassign df to apply changes.


Restrictions

The editor blocks a set of Python keywords and patterns on every keystroke. Blocked terms trigger a "Not Allowed" toast and the change is rejected.

BlockedReason
def, class, returnFunction and class definitions not permitted
importAll imports handled by preamble
os, exec, eval, open, subprocessSystem access not permitted
print, assert, raise, yieldOutput and control flow keywords blocked
to_csv, to_json, to_parquet, and other write methodsData export not permitted from editor
to_pandasPandas conversion not permitted

Write expression-level code using df = assignments. Do not define functions or use control flow keywords.

The editor runs on a 16,600-row preview sample only. Production pipeline execution uses the full dataset.


How to Use Code Editor

Open the Code Editor operation

In your Transform pipeline, click Add Operation and select Code Editor (Advanced Mode) from the operation list.

Select a language

The language dropdown at the top left shows the available runtime. Select Python - polars (the default and only active option).

Write your script

Click below the read-only preamble block and write your transformation. Use df = to reassign the dataframe with your changes. All preloaded libraries (pl, np, scipy, math, re) are available without importing.

Example - add a revenue band column:

df = df.with_columns(
    pl.when(pl.col("revenue") > 1000)
      .then(pl.lit("High"))
      .when(pl.col("revenue") > 500)
      .then(pl.lit("Medium"))
      .otherwise(pl.lit("Low"))
      .alias("revenue_band")
)

Click Test

Click the Test button (top right of the editor) to run the script against the 16,600-row preview sample.

  • Success: Toast confirms "Script executed successfully. Preview has been refreshed." The output pane closes. Save & Preview becomes enabled.
  • Error: Toast shows "Script error detected. Check the right pane for details." An output pane opens on the right showing the error message. Fix the script and Test again.

Click Save & Preview

Save & Preview is enabled only after a successful Test. Click it to apply the script to the pipeline. The success toast confirms: "Script saved. Preview has been updated with the latest transformation output."


Before and After

Rep performance score with percentile rank

Score each sales rep by combining their revenue share within region and discount efficiency - a multi-step calculation requiring window functions, cross-column arithmetic, and percentile ranking in a single pass. No combination of no-code operations can produce this in one step.

Input (first 3 rows of edilitics_sample_orders):

order_idsales_repregionrevenuediscount_pctunits_sold
ORD-2024-0001Marcus WilliamsNorth America780.620.253
ORD-2024-0002Yuki TanakaNorth America4703.290.2510
ORD-2024-0003Tomás HerreraEurope7347.740.1527

Script:

df = df.with_columns([
    (pl.col("revenue") / pl.col("revenue").sum().over("region")).alias("revenue_share_in_region"),
    (pl.col("revenue") / pl.col("units_sold")).alias("revenue_per_unit"),
])

df = df.with_columns(
    (
        pl.col("revenue_share_in_region") * 0.6
        + (1 - pl.col("discount_pct")) * 0.4
    ).alias("rep_score_raw")
)

df = df.with_columns(
    (pl.col("rep_score_raw").rank("average") / pl.col("rep_score_raw").len())
    .alias("rep_score_pct_rank")
)

Output (first 3 rows):

order_idsales_repregionrevenuerep_score_rawrep_score_pct_rank
ORD-2024-0001Marcus WilliamsNorth America780.620.4870.31
ORD-2024-0002Yuki TanakaNorth America4703.290.6120.58
ORD-2024-0003Tomás HerreraEurope7347.740.7540.82

Output: 500 rows, 17 columns (13 original + 4 derived).


Example Scripts

Z-score normalisation per region

Normalise revenue within each region to identify outliers. Requires per-partition mean and standard deviation - not available in any no-code operation.

df = df.with_columns([
    pl.col("revenue").mean().over("region").alias("region_revenue_mean"),
    pl.col("revenue").std().over("region").alias("region_revenue_std"),
])

df = df.with_columns(
    ((pl.col("revenue") - pl.col("region_revenue_mean")) / pl.col("region_revenue_std"))
    .alias("revenue_zscore")
)

df = df.filter(pl.col("revenue_zscore").abs() < 3)

Extract structured fields from a JSON metadata column

The metadata column contains JSON strings like {"source": "web", "campaign": "q1_inbound"}. Extract both fields into separate columns. Requires row-level string parsing - not possible with String Extract or any no-code operation.

df = df.with_columns([
    pl.col("metadata").map_elements(
        lambda x: re.search(r'"source":\s*"([^"]+)"', x).group(1) if x and re.search(r'"source":\s*"([^"]+)"', x) else None,
        return_dtype=pl.String
    ).alias("lead_source"),
    pl.col("metadata").map_elements(
        lambda x: re.search(r'"campaign":\s*"([^"]+)"', x).group(1) if x and re.search(r'"campaign":\s*"([^"]+)"', x) else None,
        return_dtype=pl.String
    ).alias("campaign"),
])

Days to delivery and SLA breach flag

Compute the number of days between order_date and delivery_date, then flag rows where delivery took longer than 30 days. Requires date casting, cross-column arithmetic, and a derived boolean - not achievable in a single no-code step.

df = df.with_columns([
    pl.col("order_date").cast(pl.Date).alias("order_date_dt"),
    pl.col("delivery_date").cast(pl.Date).alias("delivery_date_dt"),
])

df = df.with_columns(
    (pl.col("delivery_date_dt") - pl.col("order_date_dt"))
    .dt.total_days()
    .alias("days_to_delivery")
)

df = df.with_columns(
    (pl.col("days_to_delivery") > 30).alias("sla_breach")
)

Cumulative revenue milestone flag per sales rep

Compute running total of revenue per sales rep ordered by date, then flag the row where cumulative revenue crosses 50,000 for the first time. Requires ordered cumulative sum partitioned by rep - not available in Window Functions no-code operation for milestone detection.

df = df.sort(["sales_rep", "order_date"])

df = df.with_columns(
    pl.col("revenue").cum_sum().over("sales_rep").alias("cumulative_revenue_by_rep")
)

df = df.with_columns(
    (
        (pl.col("cumulative_revenue_by_rep") >= 50000)
        & (pl.col("cumulative_revenue_by_rep").shift(1).over("sales_rep") < 50000)
    ).alias("crossed_50k_milestone")
)

After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


Operation Reference

Prop

Type


Frequently Asked Questions


Next Steps

Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.

Last updated on

On this page