What is the difference between Equal Width and Quantile binning?

Equal Width divides the min–max range into intervals of identical size regardless of record distribution. Quantile creates intervals so each bin holds roughly the same number of records. Use Equal Width to analyze distribution shape; use Quantile to build balanced population segments.

Why did Quantile binning produce fewer bins than I requested?

Quantile calculates breakpoints from your data's actual distribution. Duplicate or tightly clustered values produce coincident breakpoints that are automatically removed, reducing the final bin count. Request more bins than you need or switch to Custom Breaks to work around this.

How do I add Custom Breakpoints in Edilitics?

Add each boundary value one at a time using the breakpoint input field, then click Add Breakpoint. Repeat for each boundary. Values must be in ascending order. The number of bins equals the number of breakpoints plus one.

What output column type does Bin / Discretize produce?

The output column is always String (categorical) regardless of the source column numeric type. The source column is preserved unchanged.

How are null values handled during binning?

Null values in the source column are not assigned to any bin and remain null in the output. They are excluded from all bin boundary calculations.

What is the maximum number of bins I can create?

For Equal Width and Quantile strategies, the bin count range is 2 to 100. Custom Breaks has no enforced limit on the number of breakpoints.

Bin / Discretize Numerical Data

Group continuous numerical values into labeled buckets using Equal Width, Quantile, or Custom Breaks visually without writing code in Edilitics Transform.

Binning, also called discretization, converts a continuous numerical column into labeled categories called bins. Instead of raw values like $6,353 or $41,200, your analysis sees Mid-Market or Enterprise. These are labels that carry direct business meaning.

Edilitics handles boundary math, null safety, and label assignment visually. No formulas, no code.

When to Use Bin / Discretize

Use this operation when you need to turn a numerical column into a category that people can act on:

Customer segmentation : group revenue into SMB / Mid-Market / Enterprise tiers to route accounts to the right sales team
Performance grading : convert test scores or satisfaction ratings into grade bands (A / B / C / D) for reporting
Age or tenure bands : turn account_age_days into cohorts like New / Growing / Mature for lifecycle analysis
Risk classification : bin discount_pct into Low / Medium / High discount groups to flag margin risk
Feature preparation : convert continuous numerical features into categorical labels before building dashboards or feeding downstream ML pipelines

If your column is already categorical (text values, true/false, fixed codes), this operation does not apply. Use Filter or Conditional Column instead.

Sample Dataset

edilitics_sample_orders.csv

500 B2B SaaS orders, H1 2024. Key columns for this operation: revenue (Float), units_sold (Integer), discount_pct (Float, 17% null). · 500 rows

Download

Relevant columns:

Prop

Type

The examples below all use revenue from this dataset. Download it to follow along in Edilitics.

Supported Strategies

Divides the full min–max range into a specified number of intervals of identical size.

Best for: Understanding value distribution across a uniform scale, histograms, age bands, score ranges where the interval size has inherent meaning.

Example using revenue (range $506–$83,841, 4 bins):

Bin	Range	Interval width
Bin 1	$506 – $21,421	~$20,915
Bin 2	$21,422 – $42,336	~$20,915
Bin 3	$42,337 – $63,251	~$20,915
Bin 4	$63,252 – $83,841	~$20,915

Bins with fewer records still span the full interval. This strategy reveals distribution shape. If Bin 1 has 420 records and Bin 4 has 8, the data is heavily skewed toward lower revenue.

Creates bins so each contains approximately the same number of records, regardless of value range.

Best for: Balanced customer segments, percentile rankings, any grouping where equal representation per segment matters more than equal interval size.

Example using revenue (4 bins = quartiles, 500 rows):

Bin	Approximate range	Records
Q1	$506 – $3,114	~125
Q2	$3,115 – $6,353	~125
Q3	$6,354 – $19,842	~125
Q4	$19,843 – $83,841	~125

Intervals are unequal in width because the revenue data is right-skewed, meaning most orders cluster at lower values.

Quantile deduplication: If your data has many identical or tightly clustered values, multiple calculated breakpoints may coincide and are automatically removed. This can produce fewer bins than requested. If you request 10 bins and receive 7, your data lacks sufficient distinct quantile boundaries. Increase the requested count or switch to Custom Breaks.

You define exact numerical boundaries. Edilitics creates one bin per interval between breakpoints, plus one bin above the highest breakpoint.

Best for: Aligning data to established business thresholds, including revenue tiers, SLA durations, and credit score bands, where boundaries must match external definitions precisely.

Example using revenue (breakpoints: 10,000 and 40,000):

Bin	Range	Label
1	Below $10,000	SMB
2	$10,000 – $40,000	Mid-Market
3	Above $40,000	Enterprise

Breakpoints must be entered in ascending order. The number of bins equals the number of breakpoints plus one.

How to Bin Data in Edilitics

Open Transform and load your dataset

Open the Transform module. Create or open a transformation using edilitics_sample_orders.csv as the source. The column list and data preview appear automatically.

Add the Bin / Discretize operation

In the left panel under All Transformations, click Bin / Discretize. The configuration panel opens on the right.

Select the Source Column

Under Source Column, select revenue (or any Integer or Float column). This is the column whose values will be grouped into bins. Categorical and string columns are not valid here.

Choosing revenue means every order record gets assigned a deal-size tier based on its revenue value.

Choose a Strategy and configure bins

Select Equal Width, Quantile, or Custom Breaks:

Equal Width / Quantile : enter a bin count between 2 and 100. Start with 4 if you're unsure. You can re-run with a different count after seeing the preview.
Custom Breaks : click Add Breakpoint, enter a value, repeat for each boundary. Values must be in ascending order.

Choosing your breakpoints for Custom Breaks: Use thresholds that already exist in your business. Use your CRM tier boundaries, pricing plan limits, or SLA bands. For example, if deals under $10,000 are classified as SMB in your CRM, use 10000 as your first breakpoint. For this example, add 10000 then 40000 to produce three tiers: SMB / Mid-Market / Enterprise.

Assign labels (optional)

Label fields appear automatically below the bin count or breakpoint inputs, one field per bin, appearing as soon as a valid count or at least one breakpoint is set. Leave blank to use defaults (Bin 1, Bin 2, …). For this example, enter SMB in the first field, Mid-Market in the second, Enterprise in the third.

Name the output column

Enter a name in Output Column Name, for example, revenue_tier. Rules: letters, numbers, and underscores only; cannot start with a number or __.

Preview and apply

The data preview shows the new revenue_tier column alongside original revenue. Verify bin assignments look correct, then click Apply. The original column is preserved.

Before & After

Input (5 rows from edilitics_sample_orders.csv):

order_id	revenue
ORD-2024-0001	6,420.50
ORD-2024-0042	38,915.00
ORD-2024-0087	72,300.75
ORD-2024-0113	9,850.20
ORD-2024-0201	null

After Bin / Discretize (Custom Breaks: 10,000 / 40,000 · Labels: SMB / Mid-Market / Enterprise):

order_id	revenue	revenue_tier
ORD-2024-0001	6,420.50	SMB
ORD-2024-0042	38,915.00	Mid-Market
ORD-2024-0087	72,300.75	Enterprise
ORD-2024-0113	9,850.20	SMB
ORD-2024-0201	null	null

Null revenue → null tier. Original revenue column unchanged.

What Binning Replaces in Code

Already writing this in SQL or Python? Here is what Bin / Discretize replaces, so you can delete it once you've migrated to Edilitics.

-- Works in PostgreSQL, BigQuery, Snowflake, Redshift, DuckDB
SELECT
  order_id,
  revenue,
  CASE
    WHEN revenue < 10000 THEN 'SMB'
    WHEN revenue < 40000 THEN 'Mid-Market'
    ELSE 'Enterprise'
  END AS revenue_tier
FROM edilitics_sample_orders;

import polars as pl

df = pl.read_csv("edilitics_sample_orders.csv")

df = df.with_columns(
pl.when(pl.col("revenue") < 10_000).then(pl.lit("SMB"))
.when(pl.col("revenue") < 40_000).then(pl.lit("Mid-Market"))
.otherwise(pl.lit("Enterprise"))
.alias("revenue_tier")
)

Both require pre-calculating breakpoints, handling nulls separately, and rewriting logic every time thresholds change. In Edilitics, adjust a number in the UI and re-run.

After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.

Bin / Discretize Numerical Data

When to Use Bin / Discretize

Sample Dataset

Supported Strategies

How to Bin Data in Edilitics

Open Transform and load your dataset

Add the Bin / Discretize operation

Select the Source Column

Choose a Strategy and configure bins

Assign labels (optional)

Name the output column

Preview and apply

Before & After

What Binning Replaces in Code

Operation Reference

Frequently Asked Questions

Next Steps

Group By

Filter

Conditional Column

Pivot / Unpivot

On this page

Bin / Discretize Numerical Data

What is the difference between Equal Width and Quantile binning?

Why did Quantile binning produce fewer bins than I requested?

How do I add Custom Breakpoints?

Can I bin integer columns like units_sold?

How are null values handled?

What output type does this operation produce?

Is there a limit on the number of bins?

Can I use the binned column in downstream operations?

Group By

Filter

Conditional Column

Pivot / Unpivot

On this page