JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

What is the difference between Join and Merge in Pandas?

Author:JIYIK Last Updated:2025/05/03 Views:

In Pandas, two DataFrames Seriescan be easily joined or combined DataFrameusing various operations such as join and joinmerge merge. These operations combine two DataFrames based on the index and column names. Both join joinand mergemerge methods can combine two DataFrames. The main difference between join and merge operations is that jointhe join method combines two DataFrames based on their indices, whereas in mergethe join method, we need to specify the columns to combine the two DataFrames.

This article will discuss the difference between joinand mergemethods in pandas python.


Pandas DataFrame .joinMethods

joinThe method joins two dataframes on their indices DataFrame. Let's take an example to show joinhow the method works. We took two DataFrames: left_dfand right_df. Using left_df.join(right_df)the code, we combined the two DataFrames.

Sample code:

import pandas as pd

# create two dataframe
df_left = pd.DataFrame({"Name": ["X", "Y", "Z"], "Score": [10, 8, 9]}).set_index("Name")
df_right = pd.DataFrame({"Name": ["X", "Y", "Z"], "Steals": [4, 5, 2]}).set_index(
    "Name"
)
print(df_left)
print(df_right)

# join two dataframes
df_left.join(df_right)

Output:

      Score
Name       
X        10
Y         8
Z         9
      Steals
Name        
X          4
Y          5
Z          2
Score	Steals
Name		
X	10	4
Y	8	5
Z	9	2

If we have overlapping columns in both the DataFrames, in that case, join will expect you to add suffix to the overlapping or common column names from the left DataFrame. In the following DataFrames, the overlapping column names are C.

Sample code:

import pandas as pd

# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
# join two dataframes
joined_df = df_left.join(df_right, lsuffix="_")
print(joined_df)

Output:

   C  D
A  x  1
B  y  2
   C  F
A  u  3
B  v  4
  C_  D  C  F
A  x  1  u  3
B  y  2  v  4

As you can see in the above output, the index retains four columns. We can also single out specific columns on the left DataFrame by using the on parameter as the join key.


Pandas DataFrame .mergeMethods

mergeThe merge method is also used to merge two DataFrames. However, the merge method requires column names as the merge key to merge the two DataFrames. In the following example, we have implemented a simple merge function to merge two DataFrames without using any parameters.

Sample code:

import pandas as pd

# create two dataframe
df_left = pd.DataFrame({"Name": ["X", "Y", "Z"], "Score": [10, 8, 9]}).set_index("Name")
df_right = pd.DataFrame({"Name": ["X", "Y", "Z"], "Steals": [4, 5, 2]}).set_index(
    "Name"
)
print(df_left)
print(df_right)

# merge two dataframes
df_left.merge(df_right, on="Name")

Output:

      Score
Name       
X        10
Y         8
Z         9
      Steals
Name        
X          4
Y          5
Z          2
Score	Steals
Name		
X	10	4
Y	8	5
Z	9	2

We can specify overlapping column names mergeusing onthe parameter in the merge method. In the following example, we specify the overlapping column names Cto perform a merge operation on two DataFrames.

Sample code:

import pandas as pd

# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)

# merge dataframes
merged_df = df_left.merge(df_right, on="C", how="outer")
print(merged_df)

Output:

   C  D
A  x  1
B  y  2
   C  F
A  u  3
B  v  4
   C    D    F
0  x  1.0  NaN
1  y  2.0  NaN
2  u  NaN  3.0
3  v  NaN  4.0

It is specified using right_onthe and left_onparameters respectively. See the following example where we have used different parameters like on, left_on, right_onfor better understanding.

Sample code:

import pandas as pd

# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
merged_df = df_left.merge(
    df_right, left_index=True, right_index=True, suffixes=["_", ""]
)
print(merged_df)

Output:

   C  D
A  x  1
B  y  2
   C  F
A  u  3
B  v  4
  C_  D  C  F
A  x  1  u  3
B  y  2  v  4

in conclusion

We have demonstrated the difference between joinand in pandas with some examples merge. We have seen these two methods, joinand mergeare used for similar purpose, to combine DataFrames in pandas. However, the difference is that the method combines two DataFrames on jointheir , whereas in the method, we specify the column names to combine two DataFrames.indexedmerge

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Convert Tensor to NumPy array in Python

Publish Date:2025/05/03 Views:85 Category:Python

This tutorial will show you how to convert a Tensor to a NumPy array in Python. Use the function in Python Tensor.numpy() to convert a tensor to a NumPy array Eager Execution of TensorFlow library can be used to convert tensor to NumPy arra

Saving NumPy arrays as images in Python

Publish Date:2025/05/03 Views:193 Category:Python

In Python, numpy module is used to manipulate arrays. There are many modules available in Python that allow us to read and store images. An image can be thought of as an array of different pixels stored at specific locations with correspond

Transposing a 1D array in NumPy

Publish Date:2025/05/03 Views:98 Category:Python

Arrays and matrices form the core of this Python library. The transpose of these arrays and matrices plays a vital role in certain topics such as machine learning. In NumPy, it is easy to calculate the transpose of an array or a matrix. Tra

Find the first index of an element in a NumPy array

Publish Date:2025/05/03 Views:58 Category:Python

In this tutorial, we will discuss how to find the first index of an element in a numpy array. Use where() the function to find the first index of an element in a NumPy array The function in the numpy module where() is used to return an arra

Remove Nan values from NumPy array

Publish Date:2025/05/03 Views:118 Category:Python

This article discusses some built-in NumPy functions that you can use to remove nan values. Remove Nan values ​​using logical_not() and methods in NumPy isnan() logical_not() is used to apply logical NOT to the elements of an array. isn

Normalizing a vector in Python

Publish Date:2025/05/03 Views:51 Category:Python

A common concept in the field of machine learning is to normalize a vector or dataset before passing it to the algorithm. When we talk about normalizing a vector, we say that its vector magnitude is 1, being a unit vector. In this tutorial,

Calculating Euclidean distance in Python

Publish Date:2025/05/03 Views:128 Category:Python

In the world of mathematics, the shortest distance between two points in any dimension is called the Euclidean distance. It is the square root of the sum of the squares of the differences between the two points. In Python, the numpy, scipy

Element-wise division in Python NumPy

Publish Date:2025/05/03 Views:199 Category:Python

This tutorial shows you how to perform element-wise division on NumPy arrays in Python. NumPy Element-Wise Division using numpy.divide() the function If we have two arrays and want to divide each element of the first array with each element

Convert 3D array to 2D array in Python

Publish Date:2025/05/03 Views:79 Category:Python

In this tutorial, we will discuss the methods to convert 3D array to 2D array in Python. numpy.reshape() Convert 3D array to 2D array using function in Python [ numpy.reshape() Function](numpy.reshape - NumPy v1.20 manual)Changes the shape

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial