Combining DataFrames based on the values in common columns or indices is a frequent requirement, analogous to joining tables in a relational database like SQL. While concatenation, often performed using pd.concat, is useful for stacking datasets with similar structures, these value-based combinations require a different approach. Pandas provides the pd.merge() function for these operations.pd.merge() allows you to combine rows from two DataFrames based on shared values in specified columns, often called "keys". Think of it like looking up information in one table based on an identifier found in another table.The Core Idea: Merging on KeysLet's imagine we have two simple DataFrames: one containing employee information and another containing department information.import pandas as pd # Employee data employees = pd.DataFrame({ 'employee_id': [101, 102, 103, 104], 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'dept_id': [10, 20, 10, 30] }) # Department data departments = pd.DataFrame({ 'dept_id': [10, 20, 40], 'dept_name': ['Engineering', 'Marketing', 'Sales'] }) print("Employees DataFrame:") print(employees) print("\nDepartments DataFrame:") print(departments)Output:Employees DataFrame: employee_id name dept_id 0 101 Alice 10 1 102 Bob 20 2 103 Charlie 10 3 104 David 30 Departments DataFrame: dept_id dept_name 0 10 Engineering 1 20 Marketing 2 40 SalesBoth DataFrames share a dept_id column. We can use this column as the "key" to merge the two DataFrames, adding the dept_name to the employee information.The simplest way to merge is to specify the DataFrames and the column(s) to merge on using the on parameter:# Merge employees and departments based on 'dept_id' merged_df = pd.merge(employees, departments, on='dept_id') print("\nMerged DataFrame (Default - Inner Join):") print(merged_df)Output:Merged DataFrame (Default - Inner Join): employee_id name dept_id dept_name 0 101 Alice 10 Engineering 1 103 Charlie 10 Engineering 2 102 Bob 20 MarketingNotice a few things:The dept_name is now associated with the corresponding employees based on dept_id.David (employee_id 104, dept_id 30) is missing from the result. Why? Because dept_id 30 doesn't exist in the departments DataFrame.The Sales department (dept_id 40) is also missing. Why? Because no employee in the employees DataFrame has dept_id 40.By default, pd.merge() performs an inner join. This means it only keeps rows where the (dept_id in this case) exists in both DataFrames. We'll explore other join types shortly.Specifying Different Column NamesWhat if the columns have different names in the two DataFrames? For example, suppose the department key in the employees DataFrame was named department_code instead of dept_id.# Modified employee data with different name employees_alt = pd.DataFrame({ 'employee_id': [101, 102, 103, 104], 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'department_code': [10, 20, 10, 30] # Renamed column }) print("Employees DataFrame (Alternative):") print(employees_alt) print("\nDepartments DataFrame:") print(departments)Output:Employees DataFrame (Alternative): employee_id name department_code 0 101 Alice 10 1 102 Bob 20 2 103 Charlie 10 3 104 David 30 Departments DataFrame: dept_id dept_name 0 10 Engineering 1 20 Marketing 2 40 SalesIn this situation, you can't use the on parameter directly. Instead, you use left_on and right_on to specify the column name for the left (employees_alt) and right (departments) DataFrames, respectively.# Merge using left_on and right_on merged_alt_keys = pd.merge(employees_alt, departments, left_on='department_code', right_on='dept_id') print("\nMerged DataFrame:") print(merged_alt_keys)Output:Merged DataFrame (Different Names): employee_id name department_code dept_id dept_name 0 101 Alice 10 10 Engineering 1 103 Charlie 10 10 Engineering 2 102 Bob 20 20 MarketingThe result is the same as before, but notice that both columns (department_code and dept_id) are included in the merged DataFrame. You might want to drop one of them after the merge if it's redundant.Merging on Multiple KeysYou can also merge based on multiple columns. Simply provide a list of column names to the on parameter (or use left_on and right_on with lists of column names). The merge operation will then only combine rows where all specified columns match between the DataFrames.# Example DataFrames with multiple potential keys df_left = pd.DataFrame({ 'key1': ['A', 'B', 'B', 'C'], 'key2': [1, 2, 1, 2], 'left_val': [10, 20, 30, 40] }) df_right = pd.DataFrame({ 'key1': ['B', 'C', 'C', 'D'], 'key2': [1, 2, 3, 1], 'right_val': [100, 200, 300, 400] }) print("Left DataFrame:") print(df_left) print("\nRight DataFrame:") print(df_right) # Merge on both key1 and key2 merged_multi = pd.merge(df_left, df_right, on=['key1', 'key2']) print("\nMerged on Multiple Keys (Inner Join):") print(merged_multi)Output:Left DataFrame: key1 key2 left_val 0 A 1 10 1 B 2 20 2 B 1 30 3 C 2 40 Right DataFrame: key1 key2 right_val 0 B 1 100 1 C 2 200 2 C 3 300 3 D 1 400 Merged on Multiple Keys (Inner Join): key1 key2 left_val right_val 0 B 1 30 100 1 C 2 40 200Only the rows where both key1 and key2 matched in df_left and df_right were kept.The pd.merge() function is a fundamental tool for integrating data from different sources based on shared information. Understanding how to specify keys is the first step. Next, we'll look at the different types of merges you can perform.