Split Columns
Split a column into multiple new string columns by delimiter or regex. Define output column names and configure multiple split rules in one operation.
Split Columns divides a single column into multiple new string columns by separating values at a delimiter or regex pattern. You name each output column. The source column is unchanged. You can configure multiple split rules in one operation, each targeting a different source column.
When to Use Split Columns
- Composite ID decomposition. Split
ORD-2024-0001on hyphen to extractorder_prefix,order_year, andorder_numberas separate columns. - Full name to first and last. Split a
full_namecolumn on space to producefirst_nameandlast_name. - Date string parsing. Split
2024-01-15on hyphen to produceyear,month,daycolumns without a cast operation. - Email decomposition. Use regex
/^([^@]+)@(.+)$/on anemailcolumn to separate username and domain. - Key-value extraction. Use regex capture groups to pull specific parts of a structured string pattern.
Sample Dataset
The examples in this doc use the Edilitics sample orders dataset. Download it to follow along in your own workspace.
edilitics_sample_orders.csv
Sample orders dataset for hands-on practice · 500 rows
Relevant columns for Split Columns examples:
Prop
Type
Split Modes
Splits the column value at each occurrence of the chosen character or string. The value is divided into at most as many parts as there are output columns. If a value produces fewer parts than expected, the remaining output columns receive null.
Delimiter options:
| Option | Character |
|---|---|
| Space | |
| Tab | \t |
| Pipe | | |
| Hyphen | - |
| Underscore | _ |
| Custom | type any character or string |
The delimiter dropdown is a creatable select - type any character or string directly to use a custom separator.
Overflow behaviour: if the value has more parts than output columns, everything after the last split point is captured in the final output column. "ORD-2024-0001" split on hyphen into two columns gives part_1 = "ORD" and part_2 = "2024-0001".
Underflow behaviour: "Kim" split on space into two columns gives first_name = "Kim" and last_name = null.
Splits using a regular expression with capture groups. Each capture group maps to one output column in order.
Enter the pattern wrapped in forward slashes: /pattern/
Example: /^([^@]+)@(.+)$/ applied to contact@edilitics.com produces two groups: contact and edilitics.com.
If a group does not match for a row, that output column receives null. The pattern must be a valid regex or the rule cannot be submitted.
Use Regex when:
- The separator varies in width (e.g. one or more spaces:
/(\S+)\s+(.+)/) - You need to match on structure rather than a fixed character
- You want to extract specific parts using positional capture groups
If you want to extract a single value using a regex pattern - not split into multiple columns - use String Extract instead. String Extract is purpose-built for single-group extraction.
How to Apply Split Columns
Open the Split Columns operation
In your Transform pipeline, click Add Operation and select Split Columns from the operation list.
Select the column to split
Choose a source column from the Select Column To Split dropdown. All column types are available - string, numeric, and datetime columns all appear. Non-string values are converted to string internally before splitting.
Choose the split mode and separator
Use the type selector to choose Delimiter or Regex.
- Delimiter: select Space, Tab, Pipe, Hyphen, or Underscore from the dropdown, or type a custom separator directly.
- Regex: type a pattern wrapped in forward slashes (e.g.
/^([^@]+)@(.+)$/). Each capture group(...)maps to one output column in order.
Name the output columns
In the New Columns section, name each output column. Two name fields appear by default. Click Add Column to add more output columns.
Each name must:
- Contain only letters, numbers, and underscores
- Not start with
__or a number - Not already exist in the dataset or in another split rule configured in this operation
Remove a column slot with its remove icon. A minimum of two output columns is required.
Click Split Column
Click Split Column to submit this rule. It appears in the table above the configuration panel.
To configure another split on a different source column, click Add Split Rule and repeat the steps above.
Save the operation
Click Save & Preview. Edilitics applies all split rules and adds the new columns to the dataset. A success toast confirms: "Column split applied. Preview reflects the updated columns."
Save & Preview is disabled until at least one split rule has been submitted.
Verify in the preview
Check the new columns. Look for unexpected nulls - these indicate rows where the value had fewer parts than output columns, or where a regex group did not match.
Before and After
Split rule: order_id on Hyphen delimiter into order_prefix, order_year, order_number.
Input:
| order_id | region | revenue |
|---|---|---|
| ORD-2024-0001 | North America | 12450.00 |
| ORD-2024-0047 | Europe | 8320.50 |
| ORD-2024-0312 | Asia Pacific | 3750.75 |
Output:
| order_id | region | revenue | order_prefix | order_year | order_number |
|---|---|---|---|---|---|
| ORD-2024-0001 | North America | 12450.00 | ORD | 2024 | 0001 |
| ORD-2024-0047 | Europe | 8320.50 | ORD | 2024 | 0047 |
| ORD-2024-0312 | Asia Pacific | 3750.75 | ORD | 2024 | 0312 |
Source column order_id unchanged. Three new string columns added.
Code Equivalent
-- Delimiter split (PostgreSQL / BigQuery / Snowflake)
SELECT
*,
SPLIT_PART(order_id, '-', 1) AS order_prefix,
SPLIT_PART(order_id, '-', 2) AS order_year,
SPLIT_PART(order_id, '-', 3) AS order_number
FROM orders;
-- Regex capture groups (PostgreSQL)
SELECT
*,
(REGEXP_MATCH(email, '^([^@]+)@(.+)$'))[1] AS email_user,
(REGEXP_MATCH(email, '^([^@]+)@(.+)$'))[2] AS email_domain
FROM orders;import polars as pl
# Delimiter split - at most 3 parts, extras captured in last column
df = df.with_columns([
pl.col("order_id").str.splitn("-", 3).list.get(0).alias("order_prefix"),
pl.col("order_id").str.splitn("-", 3).list.get(1).alias("order_year"),
pl.col("order_id").str.splitn("-", 3).list.get(2).alias("order_number"),
])
# Regex capture groups
df = df.with_columns([
pl.col("email").str.extract(r"^([^@]+)@(.+)$", 1).alias("email_user"),
pl.col("email").str.extract(r"^([^@]+)@(.+)$", 2).alias("email_domain"),
])After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.
After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.
Operation Reference
Prop
Type
Frequently Asked Questions
Next Steps
String Extract
Extract a single pattern match from a column into one new column using regex.
Merge Columns
Reverse a split by combining multiple columns back into one with a delimiter.
Cast Data Types
Cast the new string output columns to numeric or date types if needed.
Drop / Rename Columns
Rename the new columns or drop the source column if it is no longer needed.
Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.
Last updated on
Merge Columns
Combine two or more columns into a single string column using a delimiter. Configure multiple merge rules in one operation while preserving source columns.
Manage Nulls
Drop rows with missing values or impute nulls using mean, median, mode, min, max, or constant values. Apply different strategies per column in one operation.