Edilitics | Data to Decisions

Split Columns

Split a column into multiple new string columns by delimiter or regex. Define output column names and configure multiple split rules in one operation.

Split Columns divides a single column into multiple new string columns by separating values at a delimiter or regex pattern. You name each output column. The source column is unchanged. You can configure multiple split rules in one operation, each targeting a different source column.


When to Use Split Columns

  • Composite ID decomposition. Split ORD-2024-0001 on hyphen to extract order_prefix, order_year, and order_number as separate columns.
  • Full name to first and last. Split a full_name column on space to produce first_name and last_name.
  • Date string parsing. Split 2024-01-15 on hyphen to produce year, month, day columns without a cast operation.
  • Email decomposition. Use regex /^([^@]+)@(.+)$/ on an email column to separate username and domain.
  • Key-value extraction. Use regex capture groups to pull specific parts of a structured string pattern.

Sample Dataset

The examples in this doc use the Edilitics sample orders dataset. Download it to follow along in your own workspace.

edilitics_sample_orders.csv

Sample orders dataset for hands-on practice · 500 rows

Download

Relevant columns for Split Columns examples:

Prop

Type


Split Modes

Splits the column value at each occurrence of the chosen character or string. The value is divided into at most as many parts as there are output columns. If a value produces fewer parts than expected, the remaining output columns receive null.

Delimiter options:

OptionCharacter
Space
Tab\t
Pipe|
Hyphen-
Underscore_
Customtype any character or string

The delimiter dropdown is a creatable select - type any character or string directly to use a custom separator.

Overflow behaviour: if the value has more parts than output columns, everything after the last split point is captured in the final output column. "ORD-2024-0001" split on hyphen into two columns gives part_1 = "ORD" and part_2 = "2024-0001".

Underflow behaviour: "Kim" split on space into two columns gives first_name = "Kim" and last_name = null.

Splits using a regular expression with capture groups. Each capture group maps to one output column in order.

Enter the pattern wrapped in forward slashes: /pattern/

Example: /^([^@]+)@(.+)$/ applied to contact@edilitics.com produces two groups: contact and edilitics.com.

If a group does not match for a row, that output column receives null. The pattern must be a valid regex or the rule cannot be submitted.

Use Regex when:

  • The separator varies in width (e.g. one or more spaces: /(\S+)\s+(.+)/)
  • You need to match on structure rather than a fixed character
  • You want to extract specific parts using positional capture groups

If you want to extract a single value using a regex pattern - not split into multiple columns - use String Extract instead. String Extract is purpose-built for single-group extraction.


How to Apply Split Columns

Open the Split Columns operation

In your Transform pipeline, click Add Operation and select Split Columns from the operation list.

Select the column to split

Choose a source column from the Select Column To Split dropdown. All column types are available - string, numeric, and datetime columns all appear. Non-string values are converted to string internally before splitting.

Choose the split mode and separator

Use the type selector to choose Delimiter or Regex.

  • Delimiter: select Space, Tab, Pipe, Hyphen, or Underscore from the dropdown, or type a custom separator directly.
  • Regex: type a pattern wrapped in forward slashes (e.g. /^([^@]+)@(.+)$/). Each capture group (...) maps to one output column in order.

Name the output columns

In the New Columns section, name each output column. Two name fields appear by default. Click Add Column to add more output columns.

Each name must:

  • Contain only letters, numbers, and underscores
  • Not start with __ or a number
  • Not already exist in the dataset or in another split rule configured in this operation

Remove a column slot with its remove icon. A minimum of two output columns is required.

Click Split Column

Click Split Column to submit this rule. It appears in the table above the configuration panel.

To configure another split on a different source column, click Add Split Rule and repeat the steps above.

Save the operation

Click Save & Preview. Edilitics applies all split rules and adds the new columns to the dataset. A success toast confirms: "Column split applied. Preview reflects the updated columns."

Save & Preview is disabled until at least one split rule has been submitted.

Verify in the preview

Check the new columns. Look for unexpected nulls - these indicate rows where the value had fewer parts than output columns, or where a regex group did not match.


Before and After

Split rule: order_id on Hyphen delimiter into order_prefix, order_year, order_number.

Input:

order_idregionrevenue
ORD-2024-0001North America12450.00
ORD-2024-0047Europe8320.50
ORD-2024-0312Asia Pacific3750.75

Output:

order_idregionrevenueorder_prefixorder_yearorder_number
ORD-2024-0001North America12450.00ORD20240001
ORD-2024-0047Europe8320.50ORD20240047
ORD-2024-0312Asia Pacific3750.75ORD20240312

Source column order_id unchanged. Three new string columns added.


Code Equivalent

-- Delimiter split (PostgreSQL / BigQuery / Snowflake)
SELECT
  *,
  SPLIT_PART(order_id, '-', 1) AS order_prefix,
  SPLIT_PART(order_id, '-', 2) AS order_year,
  SPLIT_PART(order_id, '-', 3) AS order_number
FROM orders;

-- Regex capture groups (PostgreSQL)
SELECT
  *,
  (REGEXP_MATCH(email, '^([^@]+)@(.+)$'))[1] AS email_user,
  (REGEXP_MATCH(email, '^([^@]+)@(.+)$'))[2] AS email_domain
FROM orders;
import polars as pl

# Delimiter split - at most 3 parts, extras captured in last column
df = df.with_columns([
    pl.col("order_id").str.splitn("-", 3).list.get(0).alias("order_prefix"),
    pl.col("order_id").str.splitn("-", 3).list.get(1).alias("order_year"),
    pl.col("order_id").str.splitn("-", 3).list.get(2).alias("order_number"),
])

# Regex capture groups
df = df.with_columns([
    pl.col("email").str.extract(r"^([^@]+)@(.+)$", 1).alias("email_user"),
    pl.col("email").str.extract(r"^([^@]+)@(.+)$", 2).alias("email_domain"),
])

After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


Operation Reference

Prop

Type


Frequently Asked Questions


Next Steps

Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.

Last updated on

On this page