Of course! Merging tables (or more accurately, DataFrames) is a fundamental operation in data analysis, especially when using the Pandas library in Python. It's similar to SQL's JOIN operations.

Let's break down the different ways to merge tables in Python, focusing on Pandas, which is the standard for this task.
The Core Concept: Keys and Joins
Merging combines two DataFrames based on the values in one or more columns. These columns are called the "key" or "on" columns. The way you specify how to combine the rows based on these keys is called the "join type".
Setup: Creating Sample DataFrames
First, let's create two simple DataFrames to work with.
import pandas as pd
# DataFrame 1: Employees
employees = pd.DataFrame({
'employee_id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'department_id': [101, 102, 101, 103]
})
# DataFrame 2: Departments
departments = pd.DataFrame({
'department_id': [101, 102, 104], # Note: dept 103 is missing, dept 104 is new
'department_name': ['Sales', 'Engineering', 'HR']
})
print("--- Employees ---")
print(employees)
print("\n--- Departments ---")
print(departments)
Output:

--- Employees ---
employee_id name department_id
0 1 Alice 101
1 2 Bob 102
2 3 Charlie 101
3 4 David 103
--- Departments ---
department_id department_name
0 101 Sales
1 102 Engineering
2 104 HR
Now, let's explore the different merge types using these DataFrames.
pd.merge() - The Main Function
The primary function for merging is pd.merge(). The most important parameters are:
left: The left DataFrame.right: The right DataFrame.on: The column name(s) to join on. If not specified, Pandas will look for common columns.how: The type of join to perform. This is the most critical parameter.
The how Parameter: Different Types of Joins
This is where you control how the tables are combined.
a) Inner Join (how='inner')
This is the default type of merge. It returns only the rows where the key exists in both DataFrames.

- Result: Departments 101 and 102 are present in both tables. Department 103 (only in employees) and 104 (only in departments) are dropped.
inner_merge = pd.merge(employees, departments, on='department_id', how='inner')
print("--- Inner Merge ---")
print(inner_merge)
Output:
--- Inner Merge ---
employee_id name department_id department_name
0 1 Alice 101 Sales
1 3 Charlie 101 Sales
2 2 Bob 102 Engineering
b) Left Join (how='left')
This returns all rows from the left DataFrame (employees), and matched rows from the right DataFrame (departments). If there's no match in the right DataFrame, the result is NaN (Not a Number).
- Result: All employees are kept. David (dept 103) gets
NaNfordepartment_name. Department 104 is dropped.
left_merge = pd.merge(employees, departments, on='department_id', how='left')
print("\n--- Left Merge ---")
print(left_merge)
Output:
--- Left Merge ---
employee_id name department_id department_name
0 1 Alice 101 Sales
1 2 Bob 102 Engineering
2 3 Charlie 101 Sales
3 4 David 103 NaN
c) Right Join (how='right')
This returns all rows from the right DataFrame (departments), and matched rows from the left DataFrame (employees). If there's no match in the left DataFrame, the result is NaN.
- Result: All departments are kept. HR (dept 104) gets
NaNfor the employee columns. David (dept 103) is dropped.
right_merge = pd.merge(employees, departments, on='department_id', how='right')
print("\n--- Right Merge ---")
print(right_merge)
Output:
--- Right Merge ---
employee_id name department_id department_name
0 1 Alice 101 Sales
1 3 Charlie 101 Sales
2 2 Bob 102 Engineering
3 NaN NaN 104 HR
d) Outer Join (how='outer')
This returns all rows from both DataFrames. Where there is no match, the result is NaN.
- Result: All employees and all departments are included. David (dept 103) and HR (dept 104) both get
NaNin the columns from the other table.
outer_merge = pd.merge(employees, departments, on='department_id', how='outer')
print("\n--- Outer Merge ---")
print(outer_merge)
Output:
--- Outer Merge ---
employee_id name department_id department_name
0 1 Alice 101 Sales
1 3 Charlie 101 Sales
2 2 Bob 102 Engineering
3 4 David 103 NaN
4 NaN NaN 104 HR
Handling Different Key Column Names
What if the key columns have different names in the two DataFrames? You can use left_on and right_on.
Let's change the departments DataFrame to use dept_id as its key.
departments_renamed = departments.rename(columns={'department_id': 'dept_id'})
print("\n--- Renamed Departments ---")
print(departments_renamed)
Output:
--- Renamed Departments ---
dept_id department_name
0 101 Sales
1 102 Engineering
2 104 HR
Now, we merge using left_on and right_on:
# Merge employees with the renamed departments
different_key_merge = pd.merge(
employees,
departments_renamed,
left_on='department_id',
right_on='dept_id',
how='left'
)
print("\n--- Merge with Different Key Names ---")
print(different_key_merge)
Output:
--- Merge with Different Key Names ---
employee_id name department_id dept_id department_name
0 1 Alice 101 101 Sales
1 2 Bob 102 102 Engineering
2 3 Charlie 101 101 Sales
3 4 David 103 103 NaN
Merging on Multiple Keys (Composite Keys)
Sometimes you need to merge on more than one column. For example, if you have order lines and products, you might need to match on both order_id and product_sku.
# DataFrame 1: Order Lines
order_lines = pd.DataFrame({
'order_id': [101, 101, 102, 103],
'product_sku': ['A-123', 'B-456', 'A-123', 'C-789'],
'quantity': [2, 1, 5, 1]
})
# DataFrame 2: Products
products = pd.DataFrame({
'product_sku': ['A-123', 'B-456', 'C-789'],
'price': [10.00, 15.50, 8.99]
})
# Merge on both order_id and product_sku
# (Note: In this simple case, product_sku is enough, but this demonstrates the concept)
# Let's assume a more complex scenario where we also have order_date
order_lines_complex = pd.DataFrame({
'order_id': [101, 101, 102, 103],
'order_date': ['2025-01-01', '2025-01-01', '2025-01-02', '2025-01-03'],
'product_sku': ['A-123', 'B-456', 'A-123', 'C-789'],
'quantity': [2, 1, 5, 1]
})
products_complex = pd.DataFrame({
'product_sku': ['A-123', 'B-456', 'C-789'],
'price': [10.00, 15.50, 8.99]
})
# We can merge on product_sku, and Pandas will keep the order_id and order_date from the left
multi_key_merge = pd.merge(
order_lines_complex,
products_complex,
on='product_sku', # We can pass a list of keys: on=['order_id', 'product_sku']
how='left'
)
print("\n--- Multi-Key Merge (conceptual) ---")
print(multi_key_merge)
Output:
--- Multi-Key Merge (conceptual) ---
order_id order_date product_sku quantity price
0 101 2025-01-01 A-123 2 10.00
1 101 2025-01-01 B-456 1 15.50
2 102 2025-01-02 A-123 5 10.00
3 103 2025-01-03 C-789 1 8.99
Alternative: DataFrame.join()
Pandas also has a .join() method, which is a convenience function. It's most useful when you want to join on the index of one DataFrame to the column(s) of another.
Key difference:
merge()is more general and flexible. You join on any column(s).join()is syntactic sugar formerge(). It defaults to joining on the index of the left DataFrame and the index of the right DataFrame, but you can change this with theonandlsuffix/rsuffixparameters.
# Set 'employee_id' as the index for the employees DataFrame
employees_indexed = employees.set_index('employee_id')
# DataFrame with a different index
salaries = pd.DataFrame({
'salary': [70000, 80000, 120000, 95000]
}, index=[1, 2, 3, 5]) # Note: David (id 4) is missing, and id 5 is new
# Use join to combine employees and their salaries
# We need suffixes because 'department_id' is in both DataFrames
joined_df = employees_indexed.join(salaries, how='left', lsuffix='_emp', rsuffix='_sal')
print("\n--- Using DataFrame.join() ---")
print(joined_df)
Output:
--- Using DataFrame.join() ---
name department_id salary
employee_id
1 Alice 101 70000.0
2 Bob 102 80000.0
3 Charlie 101 120000.0
4 David 103 NaN
5 NaN NaN 95000.0
Summary: merge() vs. join()
| Feature | pd.merge() |
DataFrame.join() |
|---|---|---|
| Primary Use | General-purpose merging on columns. | Joining an index to a column(s). |
| Keys | Explicitly defined with on, left_on, right_on. |
Defaults to indexes. on can specify a column in the right DataFrame. |
| Flexibility | High. Can join on any combination of columns/indexes. | Less flexible. Designed for index-to-column joins. |
| Suffixes | Handled automatically if column names clash. | Requires lsuffix and rsuffix if column names clash. |
| Recommendation | Use this 90% of the time. It's the standard and most powerful tool. | Use for specific cases, like joining a DataFrame to its own summary statistics (which are often indexed). |
Final Recommendation
For almost all table merging tasks in Python with Pandas, start with pd.merge(). It is the most explicit, flexible, and widely understood method. Learn the how parameter (inner, left, right, outer) well, as it controls the logic of your merge.
