Merge Pandas DataFrame based on index

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Merge Pandas DataFrame based on index

Author：JIYIK Last Updated：2025/05/02 Views：

In the world of data science and machine learning, it is imperative to master operations that organize, maintain, and clean data for further analysis. Merging two DataFrames is an example of such an operation. It turns out that merging two DataFrames is very easy using the Pandas library in Python.

Pandas provides us with two useful functions, merge() and join()merge() to merge two DataFrames. The two methods are very similar, but merge()merge() is considered more general and flexible. It also provides many parameters to change the behavior of the final DataFrame. merge() join()merges two DataFrames on their indexes, while merge()merge() allows us to specify columns that can be used as keys to merge two DataFrames.

A common parameter for both functions is how, which defines the type of connection. By default, howthe parameter merge()is for and for inner, but for both functions it can be changed to , , , and . It is useful to understand the difference between them.join()leftleftrightinnerouter

When merging two Pandas DataFrames, we assume that one is the left DataFrame and the other is the right DataFrame. Both join merge()and join()match records on a key column. innerJoin returns a DataFrame consisting of the matching records from both DataFrames. outerJoin produces a merged DataFrame containing all elements from both DataFrames, filling missing values with NaN on both sides. leftJoin contains all elements of the left DataFrame, but only matching records from the right DataFrame. leftConversely right, it contains all elements of the right DataFrame, and only matching records from the left DataFrame. All of this will be clearer in the following example code, where we will combine DataFrames.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(["a", "b", "d", "e", "h"], index=[1, 2, 4, 5, 7], columns=["C1"])
df2 = pd.DataFrame(
    ["AA", "BB", "CC", "EE", "FF"], index=[1, 2, 3, 5, 6], columns=["C2"]
)

print(df1)
print(df2)

Output:

  C1
1  a
2  b
4  d
5  e
7  h
   C2
1  AA
2  BB
3  CC
5  EE
6  FF

`merge()`Merge two Pandas DataFrames on an index using

When merging two DataFrames by their indices, the values merge()of the left_indexand right_indexparameters of the function should be True. The following code example will merge two DataFrames, joining them of type inner.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(["a", "b", "d", "e", "h"], index=[1, 2, 4, 5, 7], columns=["C1"])
df2 = pd.DataFrame(
    ["AA", "BB", "CC", "EE", "FF"], index=[1, 2, 3, 5, 6], columns=["C2"]
)

df_inner = df1.merge(df2, how="inner", left_index=True, right_index=True)

print(df_inner)

Output:

  C1  C2
1  a  AA
2  b  BB
5  e  EE

The following code will merge outerDataFrames of join type.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(["a", "b", "d", "e", "h"], index=[1, 2, 4, 5, 7], columns=["C1"])
df2 = pd.DataFrame(
    ["AA", "BB", "CC", "EE", "FF"], index=[1, 2, 3, 5, 6], columns=["C2"]
)

df_outer = df1.merge(df2, how="outer", left_index=True, right_index=True)

print(df_outer)

Output:

    C1   C2
1    a   AA
2    b   BB
3  NaN   CC
4    d  NaN
5    e   EE
6  NaN   FF
7    h  NaN

As you can see, the merged DataFrame with join type of innerhas only the matching records from both DataFrames, while outerthe DataFrame with join type of has all the elements, NaNfilling the missing records with . Now use left join.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(["a", "b", "d", "e", "h"], index=[1, 2, 4, 5, 7], columns=["C1"])
df2 = pd.DataFrame(
    ["AA", "BB", "CC", "EE", "FF"], index=[1, 2, 3, 5, 6], columns=["C2"]
)

df_left = df1.merge(df2, how="left", left_index=True, right_index=True)

print(df_left)

Output:

  C1   C2
1  a   AA
2  b   BB
4  d  NaN
5  e   EE
7  h  NaN

The merged DataFrame above has all the elements in the left DataFrame and only the matching records in the right DataFrame. The exact opposite is the right join, as shown in the figure below.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(["a", "b", "d", "e", "h"], index=[1, 2, 4, 5, 7], columns=["C1"])
df2 = pd.DataFrame(
    ["AA", "BB", "CC", "EE", "FF"], index=[1, 2, 3, 5, 6], columns=["C2"]
)

df_right = df1.merge(df2, how="right", left_index=True, right_index=True)

print(df_right)

Output:

    C1  C2
1    a  AA
2    b  BB
3  NaN  CC
5    e  EE
6  NaN  FF

`join()`To merge two Pandas DataFrames on an index, use

join()The join method merges two DataFrames based on their indices. By default, the join type is left. It always uses the index of the right DataFrame, but we can provide the keys for the left DataFrame. We can join()specify the join type for the join function just like we merge()did for the join function.

The following example shows the join type of the merged DataFrame outer.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(["a", "b", "d", "e", "h"], index=[1, 2, 4, 5, 7], columns=["C1"])
df2 = pd.DataFrame(
    ["AA", "BB", "CC", "EE", "FF"], index=[1, 2, 3, 5, 6], columns=["C2"]
)
df_outer = df1.join(df2, how="outer")
print(df_outer)

Output:

    C1   C2
1    a   AA
2    b   BB
3  NaN   CC
4    d  NaN
5    e   EE
6  NaN   FF
7    h  NaN

Previous：Differences between Pandas apply, map and applymap

Next：How to set values for specific cells in a Pandas DataFrame using index

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >