JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

Pandas DataFrame DataFrame.drop_duplicates() function

Author:JIYIK Last Updated:2025/04/30 Views:

Python Pandas DataFrame.drop_duplicates() function DataFrameremoves all duplicate rows from .


pandas.DataFrame.drop_duplicates()Syntax

DataFrame.drop_duplicates(subset: Union[Hashable, Sequence[Hashable], NoneType]=None,
                          keep: Union[str, bool]='first',
                          inplace: bool=False,
                          ignore_index: bool=False)

parameter

subset Column label or sequence of labels. Columns to consider when identifying duplicates
keep first, lastor False. Delete all duplicates except the first one ( keep=first), Delete all duplicates except the last one ( keep=last), or Delete all duplicates ( keep=False)
inplace Boolean. If true True, modify the caller's DataFrame.
ignore_index Boolean. If yes True, ignore DataFramethe index in the original. The default value is yes False, which means use the index. The default value is yes False, which means use the index.

Return Value

If inplaceis True, then DataFrameremove all duplicate rows from , otherwise is None.


Example Code: DataFrame.set_index()Remove Duplicate Rows Using Pandas Methods

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','BCD' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df_unique=df.drop_duplicates() 

print("DataFrame with Unique Rows:")
print(df_unique)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      BCD
3  Orange     34      Yes      ABC
DataFrame with Unique Rows:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      BCD

DataFrameRows 1 and 4 of the original are identical.

You can drop_duplicates()remove all duplicate rows from a DataFrame by using the method.


Example code Pandas method to set subsetparametersDataFrame.set_index()

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','ABC' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df_unique=df.drop_duplicates(subset ="Supplier") 

print("DataFrame with Unique vales of Supplier Column:")
print(df_unique)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      ABC
3  Orange     34      Yes      ABC
DataFrame with Unique vales of Supplier Column:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ

This method removes all Supplierrows in the DataFrame that do not have unique values ​​for the column.

Here, Supplierthe column has a common value for rows 1, 3, and 4. Therefore, rows 3 and 4 will be DataFramedeleted from ; by default, the first duplicate row will not be deleted.


Example Code: Setting keepParameters Pandas DataFrame.set_index()Method

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','ABC' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df_unique=df.drop_duplicates(subset ="Supplier",keep="last") 

print("DataFrame with Unique vales of Supplier Column:")
print(df_unique)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      ABC
3  Orange     34      Yes      ABC
DataFrame with Unique vales of Supplier Column:
     Name  Price In_Stock Supplier
1   Mango     24       No      XYZ
3  Orange     34      Yes      ABC

This method removes DataFrameall Supplierrows in that do not have a unique value in the column, keeping only the last duplicate row.

Here, Supplierthe column of rows 1, 3, and 4 has a common value. So rows 1 and 3 will DataFramebe deleted from .


Example Code: Pandas Method for Setting ignore_indexParametersDataFrame.set_index()

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','XYZ' ) ,
             ('banana', 14, 'No','ABC' ) ,
            ('Orange', 34, 'Yes' ,'ABC') ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier'])

print("DataFrame:")
print(df)

df.drop_duplicates(subset ="Supplier",keep="last",inplace=True,ignore_index=True) 

print("DataFrame with Unique vales of Supplier Column:")
print(df)

Output:

DataFrame:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      XYZ
2  banana     14       No      ABC
3  Orange     34      Yes      ABC
DataFrame with Unique vales of Supplier Column:
     Name  Price In_Stock Supplier
0   Mango     24       No      XYZ
1  Orange     34      Yes      ABC

Here, because ignore_indexis set to True, DataFramethe index in the original is ignored and a new index is set for the row.

Due to the effect of the function, the original is modified after inplace=Truecalling the function.ignore_index()DataFrame

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Pandas DataFrame DataFrame.query() function

Publish Date:2025/04/30 Views:107 Category:Python

The pandas.DataFrame.query() method filters the rows of the caller DataFrame using the given query expression. pandas.DataFrame.query() grammar DataFrame . query(expr, inplace = False , ** kwargs) parameter expr Filter rows based on query e

Pandas DataFrame DataFrame.min() function

Publish Date:2025/04/30 Views:162 Category:Python

Python Pandas DataFrame.min() function gets the minimum value of the DataFrame object along the specified axis. pandas.DataFrame.min() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = None , ** kwargs) pa

Pandas DataFrame DataFrame.mean() function

Publish Date:2025/04/30 Views:85 Category:Python

Python Pandas DataFrame.mean() function calculates the mean of the values ​​of the DataFrame object over the specified axis. pandas.DataFrame.mean() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = No

Pandas DataFrame DataFrame.isin() function

Publish Date:2025/04/30 Views:133 Category:Python

The pandas.DataFrame.isin(values) function checks whether each element in the caller DataFrame contains values the value specified in the input . pandas.DataFrame.isin(values) grammar DataFrame . isin(values) parameter values iterable - lis

Pandas DataFrame DataFrame.groupby() function

Publish Date:2025/04/30 Views:161 Category:Python

pandas.DataFrame.groupby() takes a DataFrame as input and divides the DataFrame into groups based on a given criterion. We can use groupby() the method to easily process large datasets. pandas.DataFrame.groupby() grammar DataFrame . groupby

Pandas DataFrame DataFrame.fillna() function

Publish Date:2025/04/30 Views:60 Category:Python

The pandas.DataFrame.fillna() function replaces the values DataFrame ​​in NaN with a certain value. pandas.DataFrame.fillna() grammar DataFrame . fillna( value = None , method = None , axis = None , inplace = False , limit = None , down

Pandas DataFrame DataFrame.dropna() function

Publish Date:2025/04/30 Views:181 Category:Python

The pandas.DataFrame.dropna() function removes null values ​​(missing values) from a DataFrame by dropping rows or columns that contain null values DataFrame . NaN ( Not a Number ) and NaT ( Not a Time ) represent null values. DataFrame

Pandas DataFrame DataFrame.assign() function

Publish Date:2025/04/30 Views:55 Category:Python

Python Pandas DataFrame.assign() function assigns new columns to DataFrame . pandas.DataFrame.assign() grammar DataFrame . assign( ** kwargs) parameter **kwargs Keyword arguments, DataFrame the column names to be assigned to are passed as k

Pandas DataFrame DataFrame.transform() function

Publish Date:2025/04/30 Views:120 Category:Python

Python Pandas DataFrame.transform() DataFrame applies a function on and transforms DataFrame . The function to be applied is passed as an argument to the function. The axis length of transform() the transformed should be the same as the ori

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial