Edilitics | Data to Decisions

Bin / Discretize Numerical Data

Group continuous numerical values into labeled buckets using Equal Width, Quantile, or Custom Breaks visually without writing code in Edilitics Transform.

Binning, also called discretization, converts a continuous numerical column into labeled categories called bins. Instead of raw values like $6,353 or $41,200, your analysis sees Mid-Market or Enterprise. These are labels that carry direct business meaning.

Edilitics handles boundary math, null safety, and label assignment visually. No formulas, no code.


When to Use Bin / Discretize

Use this operation when you need to turn a numerical column into a category that people can act on:

  • Customer segmentation : group revenue into SMB / Mid-Market / Enterprise tiers to route accounts to the right sales team
  • Performance grading : convert test scores or satisfaction ratings into grade bands (A / B / C / D) for reporting
  • Age or tenure bands : turn account_age_days into cohorts like New / Growing / Mature for lifecycle analysis
  • Risk classification : bin discount_pct into Low / Medium / High discount groups to flag margin risk
  • Feature preparation : convert continuous numerical features into categorical labels before building dashboards or feeding downstream ML pipelines

If your column is already categorical (text values, true/false, fixed codes), this operation does not apply. Use Filter or Conditional Column instead.


Sample Dataset

edilitics_sample_orders.csv

500 B2B SaaS orders, H1 2024. Key columns for this operation: revenue (Float), units_sold (Integer), discount_pct (Float, 17% null). · 500 rows

Download

Relevant columns:

Prop

Type

The examples below all use revenue from this dataset. Download it to follow along in Edilitics.


Supported Strategies

Divides the full min–max range into a specified number of intervals of identical size.

Best for: Understanding value distribution across a uniform scale, histograms, age bands, score ranges where the interval size has inherent meaning.

Example using revenue (range $506–$83,841, 4 bins):

BinRangeInterval width
Bin 1$506 – $21,421~$20,915
Bin 2$21,422 – $42,336~$20,915
Bin 3$42,337 – $63,251~$20,915
Bin 4$63,252 – $83,841~$20,915

Bins with fewer records still span the full interval. This strategy reveals distribution shape. If Bin 1 has 420 records and Bin 4 has 8, the data is heavily skewed toward lower revenue.

Creates bins so each contains approximately the same number of records, regardless of value range.

Best for: Balanced customer segments, percentile rankings, any grouping where equal representation per segment matters more than equal interval size.

Example using revenue (4 bins = quartiles, 500 rows):

BinApproximate rangeRecords
Q1$506 – $3,114~125
Q2$3,115 – $6,353~125
Q3$6,354 – $19,842~125
Q4$19,843 – $83,841~125

Intervals are unequal in width because the revenue data is right-skewed, meaning most orders cluster at lower values.

Quantile deduplication: If your data has many identical or tightly clustered values, multiple calculated breakpoints may coincide and are automatically removed. This can produce fewer bins than requested. If you request 10 bins and receive 7, your data lacks sufficient distinct quantile boundaries. Increase the requested count or switch to Custom Breaks.

You define exact numerical boundaries. Edilitics creates one bin per interval between breakpoints, plus one bin above the highest breakpoint.

Best for: Aligning data to established business thresholds, including revenue tiers, SLA durations, and credit score bands, where boundaries must match external definitions precisely.

Example using revenue (breakpoints: 10,000 and 40,000):

BinRangeLabel
1Below $10,000SMB
2$10,000 – $40,000Mid-Market
3Above $40,000Enterprise

Breakpoints must be entered in ascending order. The number of bins equals the number of breakpoints plus one.


How to Bin Data in Edilitics

Open Transform and load your dataset

Open the Transform module. Create or open a transformation using edilitics_sample_orders.csv as the source. The column list and data preview appear automatically.

Add the Bin / Discretize operation

In the left panel under All Transformations, click Bin / Discretize. The configuration panel opens on the right.

Select the Source Column

Under Source Column, select revenue (or any Integer or Float column). This is the column whose values will be grouped into bins. Categorical and string columns are not valid here.

Choosing revenue means every order record gets assigned a deal-size tier based on its revenue value.

Choose a Strategy and configure bins

Select Equal Width, Quantile, or Custom Breaks:

  • Equal Width / Quantile : enter a bin count between 2 and 100. Start with 4 if you're unsure. You can re-run with a different count after seeing the preview.
  • Custom Breaks : click Add Breakpoint, enter a value, repeat for each boundary. Values must be in ascending order.

Choosing your breakpoints for Custom Breaks: Use thresholds that already exist in your business. Use your CRM tier boundaries, pricing plan limits, or SLA bands. For example, if deals under $10,000 are classified as SMB in your CRM, use 10000 as your first breakpoint. For this example, add 10000 then 40000 to produce three tiers: SMB / Mid-Market / Enterprise.

Assign labels (optional)

Label fields appear automatically below the bin count or breakpoint inputs, one field per bin, appearing as soon as a valid count or at least one breakpoint is set. Leave blank to use defaults (Bin 1, Bin 2, …). For this example, enter SMB in the first field, Mid-Market in the second, Enterprise in the third.

Name the output column

Enter a name in Output Column Name, for example, revenue_tier. Rules: letters, numbers, and underscores only; cannot start with a number or __.

Preview and apply

The data preview shows the new revenue_tier column alongside original revenue. Verify bin assignments look correct, then click Apply. The original column is preserved.


Before & After

Input (5 rows from edilitics_sample_orders.csv):

order_idrevenue
ORD-2024-00016,420.50
ORD-2024-004238,915.00
ORD-2024-008772,300.75
ORD-2024-01139,850.20
ORD-2024-0201null

After Bin / Discretize (Custom Breaks: 10,000 / 40,000 · Labels: SMB / Mid-Market / Enterprise):

order_idrevenuerevenue_tier
ORD-2024-00016,420.50SMB
ORD-2024-004238,915.00Mid-Market
ORD-2024-008772,300.75Enterprise
ORD-2024-01139,850.20SMB
ORD-2024-0201nullnull

Null revenue → null tier. Original revenue column unchanged.


What Binning Replaces in Code

Already writing this in SQL or Python? Here is what Bin / Discretize replaces, so you can delete it once you've migrated to Edilitics.

-- Works in PostgreSQL, BigQuery, Snowflake, Redshift, DuckDB
SELECT
  order_id,
  revenue,
  CASE
    WHEN revenue < 10000 THEN 'SMB'
    WHEN revenue < 40000 THEN 'Mid-Market'
    ELSE 'Enterprise'
  END AS revenue_tier
FROM edilitics_sample_orders;
import polars as pl

df = pl.read_csv("edilitics_sample_orders.csv")

df = df.with_columns(
pl.when(pl.col("revenue") < 10_000).then(pl.lit("SMB"))
.when(pl.col("revenue") < 40_000).then(pl.lit("Mid-Market"))
.otherwise(pl.lit("Enterprise"))
.alias("revenue_tier")
)

Both require pre-calculating breakpoints, handling nulls separately, and rewriting logic every time thresholds change. In Edilitics, adjust a number in the UI and re-run.


After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


Operation Reference

Prop

Type


Frequently Asked Questions


Next Steps

Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.

Last updated on

On this page