Pandas DataFrame DataFrame.drop_duplicates() function

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Pandas DataFrame DataFrame.drop_duplicates() function

Author：JIYIK Last Updated：2025/04/30 Views：

Python Pandas DataFrame.drop_duplicates() function DataFrameremoves all duplicate rows from .

`pandas.DataFrame.drop_duplicates()`Syntax

DataFrame.drop_duplicates(subset: Union[Hashable, Sequence[Hashable], NoneType]=None,
                          keep: Union[str, bool]='first',
                          inplace: bool=False,
                          ignore_index: bool=False)

parameter


`subset`	Column label or sequence of labels. Columns to consider when identifying duplicates
`keep`	`first`, `last`or `False`. Delete all duplicates except the first one ( `keep=first`), Delete all duplicates except the last one ( `keep=last`), or Delete all duplicates ( `keep=False`)
`inplace`	Boolean. If true `True`, modify the caller's `DataFrame`.
`ignore_index`	Boolean. If yes `True`, ignore `DataFrame`the index in the original. The default value is yes `False`, which means use the index. The default value is yes `False`, which means use the index.

Return Value

If inplaceis True, then DataFrameremove all duplicate rows from , otherwise is None.

Example Code: `DataFrame.set_index()`Remove Duplicate Rows Using Pandas Methods

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','BCD' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df_unique=df.drop_duplicates() 

print("DataFrame with Unique Rows:")
print(df_unique)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      BCD
3  Orange     34      Yes      ABC
DataFrame with Unique Rows:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      BCD

DataFrameRows 1 and 4 of the original are identical.

You can drop_duplicates()remove all duplicate rows from a DataFrame by using the method.

Example code Pandas method to set `subset`parameters`DataFrame.set_index()`

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','ABC' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df_unique=df.drop_duplicates(subset ="Supplier") 

print("DataFrame with Unique vales of Supplier Column:")
print(df_unique)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      ABC
3  Orange     34      Yes      ABC
DataFrame with Unique vales of Supplier Column:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ

This method removes all Supplierrows in the DataFrame that do not have unique values for the column.

Here, Supplierthe column has a common value for rows 1, 3, and 4. Therefore, rows 3 and 4 will be DataFramedeleted from ; by default, the first duplicate row will not be deleted.

Example Code: Setting `keep`Parameters Pandas `DataFrame.set_index()`Method

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','ABC' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df_unique=df.drop_duplicates(subset ="Supplier",keep="last") 

print("DataFrame with Unique vales of Supplier Column:")
print(df_unique)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      ABC
3  Orange     34      Yes      ABC
DataFrame with Unique vales of Supplier Column:
     Name  Price In_Stock Supplier
1   Mango     24       No      XYZ
3  Orange     34      Yes      ABC

This method removes DataFrameall Supplierrows in that do not have a unique value in the column, keeping only the last duplicate row.

Here, Supplierthe column of rows 1, 3, and 4 has a common value. So rows 1 and 3 will DataFramebe deleted from .

Example Code: Pandas Method for Setting `ignore_index`Parameters`DataFrame.set_index()`

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','ABC' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df.drop_duplicates(subset ="Supplier",keep="last",inplace=True,ignore_index=True) 

print("DataFrame with Unique vales of Supplier Column:")
print(df)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      ABC
3  Orange     34      Yes      ABC
DataFrame with Unique vales of Supplier Column:
     Name  Price In_Stock Supplier
0   Mango     24       No      XYZ
1  Orange     34      Yes      ABC

Here, because ignore_indexis set to True, DataFramethe index in the original is ignored and a new index is set for the row.

Due to the effect of the function, the original is modified after inplace=Truecalling the function.ignore_index()DataFrame

Previous：Finding the installed version of Pandas

Next：Pandas DataFrame DataFrame.interpolate() function

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >