Joins
Merge two tables using a matching key column. Left, Right, Inner, and Outer joins supported. Join across any connected database. No code needed.
Joins merges the current dataset with a table from any connected integration using a shared key column. Four join types: Left, Right, Inner, and Outer. Select the Left Key from the current dataset, choose a join type, then select the database, table, and Right Key from the second table. Conflicting column names in the joined table get a _right suffix automatically.
The second table can come from a different database or integration than the current pipeline source. No need to land both datasets in a shared warehouse first.
Only one table can be joined per operation. Chain multiple Join operations to join more than two tables.
Cross-Database Joins
Most ETL and transformation tools require both tables to exist in the same warehouse before a join is possible. Edilitics does not.
Join any two tables across any two connected integrations in a single operation - PostgreSQL to BigQuery, MySQL to Snowflake, MongoDB to DuckDB. No staging required, no intermediate landing zone, no SQL to write.
| What it enables | Details |
|---|---|
| Join across integrations | Combine tables from any two connected databases in one step |
| No shared warehouse required | Data does not need to be co-located before joining |
| Same workflow, any source | Cross-database joins use the same UI as single-source joins |
This is a meaningful capability gap versus warehouse-first tools like dbt, Fivetran, Airbyte, and Matillion - all of which require data to be landed in a common warehouse before any join can be performed.
When to Use Joins
- Enrich orders with customer data. Left join
orderstocustomersoncustomer_idto add name, email, and region to every order row. - Find matching records only. Inner join
transactionstoaccountsto return only transactions with a valid matching account. - Retain all records from a reference table. Right join
ordersto aproduct_catalogto keep every product, even those with no orders. - Build a full combined view. Outer join two regional sales tables to see all rows from both, with nulls where no match exists.
- Join across databases. Left join a PostgreSQL
orderstable to a BigQuerymetadatatable without moving either dataset.
Sample Dataset
The examples in this doc use two sample datasets. Download both to follow along in your own workspace.
edilitics_sample_orders.csv
Primary orders dataset - use as the current pipeline source · 500 rows
edilitics_sample_products.csv
Product catalog - use as the second table to join on product_id · 4 rows
Join Types
| Type | Rows returned | Use when |
|---|---|---|
| Left Join | All rows from the current dataset. Matched columns from the second table added; nulls where no match. | Enrich the current dataset while keeping all existing rows. |
| Right Join | All rows from the second table. Matched columns from the current dataset added; nulls where no match. | The second table is the authoritative list; current dataset is supplementary. |
| Inner Join | Only rows with a matching key in both tables. | Return only confirmed matches between both datasets. |
| Outer Join | All rows from both tables. Nulls where no match on either side. | Build a complete view combining all rows from both sources. |
How to Apply a Join
Open the Join operation
In your Transform pipeline, click Add Operation and select Joins from the operation list.
Select the Left Key
In the left panel, select the key column from the current dataset to join on. Columns with object or list datatypes are excluded from the dropdown.
Select a join type
In the centre panel, click one of the four join type buttons:
- Left Join - Keep all rows from the current dataset (default).
- Right Join - Keep all rows from the second table.
- Inner Join - Keep only rows with a match in both tables.
- Outer Join - Keep all rows from both tables.
Left Join is selected by default.
Select the Database
In the right panel, select any connected integration from the Database dropdown. This can be the same integration as the current source or a different one. Tables already used by previous Join or Concat operations in this pipeline are excluded.
Select the Table
In the right panel, select the table to join from the Table dropdown.
Select the Right Key
Select the key column from the second table to join on. This column must match in value with the Left Key selected in the left panel.
Click Save & Preview
Save & Preview is enabled when all four fields are set: Left Key, join type, Database, Table, and Right Key. Click it to run the join. The success toast confirms: "Join applied. Sample preview reflects merged dataset."
Column Conflict Handling
When the second table contains a column with the same name as a column in the current dataset (other than the join key), the column from the second table is renamed automatically with a _right suffix.
The join key column from the second table (Right Key) is dropped from the output. Only the Left Key column is retained.
| Scenario | Output |
|---|---|
Column revenue exists in both tables | Current dataset: revenue. Second table: revenue_right. |
Join key order_id selected as Right Key | order_id dropped from second table output. Left table order_id retained. |
| No overlapping column names | All columns from both tables included as-is. |
Before and After
Left Join
Left join edilitics_sample_orders (500 rows, 13 columns) to edilitics_sample_products (4 rows, 7 columns) on product_id. All 500 order rows retained. Matching product columns added. Orders with a product_id not in the products table return null for product columns.
Left Key: product_id (orders). Right Key: product_id (products).
Input (current dataset, first 3 rows):
| order_id | customer_name | product_id | revenue | product_tier |
|---|---|---|---|---|
| ORD-2024-0001 | Topaz Analytics | PROD-001 | 780.62 | Growth |
| ORD-2024-0002 | Bravo Analytics | PROD-002 | 4703.29 | null |
| ORD-2024-0003 | Silverstone Tech | PROD-001 | 7347.74 | Starter |
Second table (edilitics_sample_products):
| product_id | product_name | product_tier | base_price_usd | support_level | max_users | storage_gb |
|---|---|---|---|---|---|---|
| PROD-001 | Edilitics Starter | Starter | 299.00 | Community | 5 | 10 |
| PROD-002 | Edilitics Growth | Growth | 799.00 | Standard | 25 | 100 |
| PROD-003 | Edilitics Enterprise | Enterprise | 2499.00 | Priority | 100 | 500 |
| PROD-004 | Edilitics Enterprise Plus | Enterprise | 4999.00 | Dedicated | 9999 | 2000 |
Output (first 3 rows):
| order_id | customer_name | product_id | revenue | product_tier | product_name | base_price_usd | support_level | max_users | storage_gb |
|---|---|---|---|---|---|---|---|---|---|
| ORD-2024-0001 | Topaz Analytics | PROD-001 | 780.62 | Growth | Edilitics Starter | 299.00 | Community | 5 | 10 |
| ORD-2024-0002 | Bravo Analytics | PROD-002 | 4703.29 | null | Edilitics Growth | 799.00 | Standard | 25 | 100 |
| ORD-2024-0003 | Silverstone Tech | PROD-001 | 7347.74 | Starter | Edilitics Starter | 299.00 | Community | 5 | 10 |
Output: 500 rows, 19 columns (13 original + 6 from products; product_id from right table dropped, product_tier from products renamed product_tier_right to avoid conflict).
Inner Join
Same tables, Inner Join on product_id. Only order rows with a matching product_id in edilitics_sample_products are returned. All 500 orders reference PROD-001 through PROD-004 - all match - so output remains 500 rows. Orders referencing a discontinued or unknown product_id would be excluded.
Code Equivalent
-- Left Join
SELECT o.*, p.product_name, p.category, p.list_price
FROM orders o
LEFT JOIN product_catalog p ON o.product_id = p.product_id;
-- Inner Join
SELECT o.*, p.product_name, p.category, p.list_price
FROM orders o
INNER JOIN product_catalog p ON o.product_id = p.product_id;
-- Right Join
SELECT o.order_id, o.revenue, p.*
FROM orders o
RIGHT JOIN product_catalog p ON o.product_id = p.product_id;
-- Outer Join
SELECT o.*, p.product_name, p.category, p.list_price
FROM orders o
FULL OUTER JOIN product_catalog p ON o.product_id = p.product_id;import polars as pl
# Left Join
df_result = df_orders.join(
df_catalog,
left_on="product_id",
right_on="product_id",
how="left"
)
# Inner Join
df_result = df_orders.join(
df_catalog,
left_on="product_id",
right_on="product_id",
how="inner"
)
# Right Join (flip left join for full right-table retention)
df_result = df_catalog.join(
df_orders,
left_on="product_id",
right_on="product_id",
how="left"
)
# Outer Join
df_result = df_orders.join(
df_catalog,
left_on="product_id",
right_on="product_id",
how="outer"
)After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.
After Save & Preview, the pipeline shows a DQ delta badge on this step - green if the table score improved, red if it dropped. See Data Quality Scoring for how scores are calculated.
Operation Reference
Prop
Type
Frequently Asked Questions
Next Steps
Drop / Rename Columns
Remove the _right suffix columns or rename joined columns after merging.
Filter
Filter the joined dataset to keep only matching or relevant rows.
Manage Nulls
Handle nulls introduced from Left, Right, or Outer joins.
Concat
Stack rows or align columns from another table instead of joining on a key.
Need help? Email support@edilitics.com with your workspace, job ID, and context. We reply within one business day.
Last updated on
No-Code Operations
25 operations for cleaning, shaping, enriching, and aggregating data without writing code. Each operation includes a live preview before saving.
Concat
Concatenate tables vertically, horizontally, or diagonally. Schema compatibility is checked automatically with built-in duplicate handling. No code needed.