Edilitics | Data to Decisions

Joins

Merge two tables using a matching key column. Left, Right, Inner, and Outer joins supported. Join across any connected database. No code needed.

Joins merges the current dataset with a table from any connected integration using a shared key column. Four join types: Left, Right, Inner, and Outer. Select the Left Key from the current dataset, choose a join type, then select the database, table, and Right Key from the second table. Conflicting column names in the joined table get a _right suffix automatically.

The second table can come from a different database or integration than the current pipeline source. No need to land both datasets in a shared warehouse first.

Only one table can be joined per operation. Chain multiple Join operations to join more than two tables.


Cross-Database Joins

Most ETL and transformation tools require both tables to exist in the same warehouse before a join is possible. Edilitics does not.

Join any two tables across any two connected integrations in a single operation - PostgreSQL to BigQuery, MySQL to Snowflake, MongoDB to DuckDB. No staging required, no intermediate landing zone, no SQL to write.

What it enablesDetails
Join across integrationsCombine tables from any two connected databases in one step
No shared warehouse requiredData does not need to be co-located before joining
Same workflow, any sourceCross-database joins use the same UI as single-source joins

This is a meaningful capability gap versus warehouse-first tools like dbt, Fivetran, Airbyte, and Matillion - all of which require data to be landed in a common warehouse before any join can be performed.


When to Use Joins

  • Enrich orders with customer data. Left join orders to customers on customer_id to add name, email, and region to every order row.
  • Find matching records only. Inner join transactions to accounts to return only transactions with a valid matching account.
  • Retain all records from a reference table. Right join orders to a product_catalog to keep every product, even those with no orders.
  • Build a full combined view. Outer join two regional sales tables to see all rows from both, with nulls where no match exists.
  • Join across databases. Left join a PostgreSQL orders table to a BigQuery metadata table without moving either dataset.

Sample Dataset

The examples in this doc use two sample datasets. Download both to follow along in your own workspace.

edilitics_sample_orders.csv

Primary orders dataset - use as the current pipeline source · 500 rows

Download

edilitics_sample_products.csv

Product catalog - use as the second table to join on product_id · 4 rows

Download

Join Types

TypeRows returnedUse when
Left JoinAll rows from the current dataset. Matched columns from the second table added; nulls where no match.Enrich the current dataset while keeping all existing rows.
Right JoinAll rows from the second table. Matched columns from the current dataset added; nulls where no match.The second table is the authoritative list; current dataset is supplementary.
Inner JoinOnly rows with a matching key in both tables.Return only confirmed matches between both datasets.
Outer JoinAll rows from both tables. Nulls where no match on either side.Build a complete view combining all rows from both sources.

How to Apply a Join

Open the Join operation

In your Transform pipeline, click Add Operation and select Joins from the operation list.

Select the Left Key

In the left panel, select the key column from the current dataset to join on. Columns with object or list datatypes are excluded from the dropdown.

Select a join type

In the centre panel, click one of the four join type buttons:

  • Left Join - Keep all rows from the current dataset (default).
  • Right Join - Keep all rows from the second table.
  • Inner Join - Keep only rows with a match in both tables.
  • Outer Join - Keep all rows from both tables.

Left Join is selected by default.

Select the Database

In the right panel, select any connected integration from the Database dropdown. This can be the same integration as the current source or a different one. Tables already used by previous Join or Concat operations in this pipeline are excluded.

Select the Table

In the right panel, select the table to join from the Table dropdown.

Select the Right Key

Select the key column from the second table to join on. This column must match in value with the Left Key selected in the left panel.

Click Save & Preview

Save & Preview is enabled when all four fields are set: Left Key, join type, Database, Table, and Right Key. Click it to run the join. The success toast confirms: "Join applied. Sample preview reflects merged dataset."


Column Conflict Handling

When the second table contains a column with the same name as a column in the current dataset (other than the join key), the column from the second table is renamed automatically with a _right suffix.

The join key column from the second table (Right Key) is dropped from the output. Only the Left Key column is retained.

ScenarioOutput
Column revenue exists in both tablesCurrent dataset: revenue. Second table: revenue_right.
Join key order_id selected as Right Keyorder_id dropped from second table output. Left table order_id retained.
No overlapping column namesAll columns from both tables included as-is.

Before and After

Left Join

Left join edilitics_sample_orders (500 rows, 13 columns) to edilitics_sample_products (4 rows, 7 columns) on product_id. All 500 order rows retained. Matching product columns added. Orders with a product_id not in the products table return null for product columns.

Left Key: product_id (orders). Right Key: product_id (products).

Input (current dataset, first 3 rows):

order_idcustomer_nameproduct_idrevenueproduct_tier
ORD-2024-0001Topaz AnalyticsPROD-001780.62Growth
ORD-2024-0002Bravo AnalyticsPROD-0024703.29null
ORD-2024-0003Silverstone TechPROD-0017347.74Starter

Second table (edilitics_sample_products):

product_idproduct_nameproduct_tierbase_price_usdsupport_levelmax_usersstorage_gb
PROD-001Edilitics StarterStarter299.00Community510
PROD-002Edilitics GrowthGrowth799.00Standard25100
PROD-003Edilitics EnterpriseEnterprise2499.00Priority100500
PROD-004Edilitics Enterprise PlusEnterprise4999.00Dedicated99992000

Output (first 3 rows):

order_idcustomer_nameproduct_idrevenueproduct_tierproduct_namebase_price_usdsupport_levelmax_usersstorage_gb
ORD-2024-0001Topaz AnalyticsPROD-001780.62GrowthEdilitics Starter299.00Community510
ORD-2024-0002Bravo AnalyticsPROD-0024703.29nullEdilitics Growth799.00Standard25100
ORD-2024-0003Silverstone TechPROD-0017347.74StarterEdilitics Starter299.00Community510

Output: 500 rows, 19 columns (13 original + 6 from products; product_id from right table dropped, product_tier from products renamed product_tier_right to avoid conflict).


Inner Join

Same tables, Inner Join on product_id. Only order rows with a matching product_id in edilitics_sample_products are returned. All 500 orders reference PROD-001 through PROD-004 - all match - so output remains 500 rows. Orders referencing a discontinued or unknown product_id would be excluded.


Code Equivalent

-- Left Join
SELECT o.*, p.product_name, p.category, p.list_price
FROM orders o
LEFT JOIN product_catalog p ON o.product_id = p.product_id;

-- Inner Join
SELECT o.*, p.product_name, p.category, p.list_price
FROM orders o
INNER JOIN product_catalog p ON o.product_id = p.product_id;

-- Right Join
SELECT o.order_id, o.revenue, p.*
FROM orders o
RIGHT JOIN product_catalog p ON o.product_id = p.product_id;

-- Outer Join
SELECT o.*, p.product_name, p.category, p.list_price
FROM orders o
FULL OUTER JOIN product_catalog p ON o.product_id = p.product_id;
import polars as pl

# Left Join
df_result = df_orders.join(
    df_catalog,
    left_on="product_id",
    right_on="product_id",
    how="left"
)

# Inner Join
df_result = df_orders.join(
    df_catalog,
    left_on="product_id",
    right_on="product_id",
    how="inner"
)

# Right Join (flip left join for full right-table retention)
df_result = df_catalog.join(
    df_orders,
    left_on="product_id",
    right_on="product_id",
    how="left"
)

# Outer Join
df_result = df_orders.join(
    df_catalog,
    left_on="product_id",
    right_on="product_id",
    how="outer"
)

After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.


Operation Reference

Prop

Type


Frequently Asked Questions


Next Steps

Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.

Last updated on

On this page