What is the difference between Join and Merge in Pandas?
In Pandas, two DataFrames Series
can be easily joined or combined DataFrame
using various operations such as join and join
merge merge
. These operations combine two DataFrames based on the index and column names. Both join join
and merge
merge methods can combine two DataFrames. The main difference between join and merge operations is that join
the join method combines two DataFrames based on their indices, whereas in merge
the join method, we need to specify the columns to combine the two DataFrames.
This article will discuss the difference between join
and merge
methods in pandas python.
Pandas DataFrame .join
Methods
join
The method joins two dataframes on their indices DataFrame
. Let's take an example to show join
how the method works. We took two DataFrames: left_df
and right_df
. Using left_df.join(right_df)
the code, we combined the two DataFrames.
Sample code:
import pandas as pd
# create two dataframe
df_left = pd.DataFrame({"Name": ["X", "Y", "Z"], "Score": [10, 8, 9]}).set_index("Name")
df_right = pd.DataFrame({"Name": ["X", "Y", "Z"], "Steals": [4, 5, 2]}).set_index(
"Name"
)
print(df_left)
print(df_right)
# join two dataframes
df_left.join(df_right)
Output:
Score
Name
X 10
Y 8
Z 9
Steals
Name
X 4
Y 5
Z 2
Score Steals
Name
X 10 4
Y 8 5
Z 9 2
If we have overlapping columns in both the DataFrames, in that case, join will expect you to add suffix to the overlapping or common column names from the left DataFrame. In the following DataFrames, the overlapping column names are C
.
Sample code:
import pandas as pd
# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
# join two dataframes
joined_df = df_left.join(df_right, lsuffix="_")
print(joined_df)
Output:
C D
A x 1
B y 2
C F
A u 3
B v 4
C_ D C F
A x 1 u 3
B y 2 v 4
As you can see in the above output, the index retains four columns. We can also single out specific columns on the left DataFrame by using the on parameter as the join key.
Pandas DataFrame .merge
Methods
merge
The merge method is also used to merge two DataFrames. However, the merge method requires column names as the merge key to merge the two DataFrames. In the following example, we have implemented a simple merge function to merge two DataFrames without using any parameters.
Sample code:
import pandas as pd
# create two dataframe
df_left = pd.DataFrame({"Name": ["X", "Y", "Z"], "Score": [10, 8, 9]}).set_index("Name")
df_right = pd.DataFrame({"Name": ["X", "Y", "Z"], "Steals": [4, 5, 2]}).set_index(
"Name"
)
print(df_left)
print(df_right)
# merge two dataframes
df_left.merge(df_right, on="Name")
Output:
Score
Name
X 10
Y 8
Z 9
Steals
Name
X 4
Y 5
Z 2
Score Steals
Name
X 10 4
Y 8 5
Z 9 2
We can specify overlapping column names merge
using on
the parameter in the merge method. In the following example, we specify the overlapping column names C
to perform a merge operation on two DataFrames.
Sample code:
import pandas as pd
# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
# merge dataframes
merged_df = df_left.merge(df_right, on="C", how="outer")
print(merged_df)
Output:
C D
A x 1
B y 2
C F
A u 3
B v 4
C D F
0 x 1.0 NaN
1 y 2.0 NaN
2 u NaN 3.0
3 v NaN 4.0
It is specified using right_on
the and left_on
parameters respectively. See the following example where we have used different parameters like on
, left_on
, right_on
for better understanding.
Sample code:
import pandas as pd
# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
merged_df = df_left.merge(
df_right, left_index=True, right_index=True, suffixes=["_", ""]
)
print(merged_df)
Output:
C D
A x 1
B y 2
C F
A u 3
B v 4
C_ D C F
A x 1 u 3
B y 2 v 4
in conclusion
We have demonstrated the difference between join
and in pandas with some examples merge
. We have seen these two methods, join
and merge
are used for similar purpose, to combine DataFrames in pandas. However, the difference is that the method combines two DataFrames on join
their , whereas in the method, we specify the column names to combine two DataFrames.indexed
merge
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Convert Tensor to NumPy array in Python
Publish Date:2025/05/03 Views:85 Category:Python
-
This tutorial will show you how to convert a Tensor to a NumPy array in Python. Use the function in Python Tensor.numpy() to convert a tensor to a NumPy array Eager Execution of TensorFlow library can be used to convert tensor to NumPy arra
Saving NumPy arrays as images in Python
Publish Date:2025/05/03 Views:193 Category:Python
-
In Python, numpy module is used to manipulate arrays. There are many modules available in Python that allow us to read and store images. An image can be thought of as an array of different pixels stored at specific locations with correspond
Transposing a 1D array in NumPy
Publish Date:2025/05/03 Views:98 Category:Python
-
Arrays and matrices form the core of this Python library. The transpose of these arrays and matrices plays a vital role in certain topics such as machine learning. In NumPy, it is easy to calculate the transpose of an array or a matrix. Tra
Find the first index of an element in a NumPy array
Publish Date:2025/05/03 Views:58 Category:Python
-
In this tutorial, we will discuss how to find the first index of an element in a numpy array. Use where() the function to find the first index of an element in a NumPy array The function in the numpy module where() is used to return an arra
Remove Nan values from NumPy array
Publish Date:2025/05/03 Views:118 Category:Python
-
This article discusses some built-in NumPy functions that you can use to remove nan values. Remove Nan values using logical_not() and methods in NumPy isnan() logical_not() is used to apply logical NOT to the elements of an array. isn
Normalizing a vector in Python
Publish Date:2025/05/03 Views:51 Category:Python
-
A common concept in the field of machine learning is to normalize a vector or dataset before passing it to the algorithm. When we talk about normalizing a vector, we say that its vector magnitude is 1, being a unit vector. In this tutorial,
Calculating Euclidean distance in Python
Publish Date:2025/05/03 Views:128 Category:Python
-
In the world of mathematics, the shortest distance between two points in any dimension is called the Euclidean distance. It is the square root of the sum of the squares of the differences between the two points. In Python, the numpy, scipy
Element-wise division in Python NumPy
Publish Date:2025/05/03 Views:199 Category:Python
-
This tutorial shows you how to perform element-wise division on NumPy arrays in Python. NumPy Element-Wise Division using numpy.divide() the function If we have two arrays and want to divide each element of the first array with each element
Convert 3D array to 2D array in Python
Publish Date:2025/05/03 Views:79 Category:Python
-
In this tutorial, we will discuss the methods to convert 3D array to 2D array in Python. numpy.reshape() Convert 3D array to 2D array using function in Python [ numpy.reshape() Function](numpy.reshape - NumPy v1.20 manual)Changes the shape