Pandas DataFrame DataFrame.drop_duplicates() function
Python Pandas DataFrame.drop_duplicates() function DataFrame
removes all duplicate rows from .
pandas.DataFrame.drop_duplicates()
Syntax
DataFrame.drop_duplicates(subset: Union[Hashable, Sequence[Hashable], NoneType]=None,
keep: Union[str, bool]='first',
inplace: bool=False,
ignore_index: bool=False)
parameter
subset |
Column label or sequence of labels. Columns to consider when identifying duplicates |
keep |
first , last or False . Delete all duplicates except the first one ( keep=first ), Delete all duplicates except the last one ( keep=last ), or Delete all duplicates ( keep=False ) |
inplace |
Boolean. If true True , modify the caller's DataFrame . |
ignore_index |
Boolean. If yes True , ignore DataFrame the index in the original. The default value is yes False , which means use the index. The default value is yes False , which means use the index. |
Return Value
If inplace
is True
, then DataFrame
remove all duplicate rows from , otherwise is None
.
Example Code: DataFrame.set_index()
Remove Duplicate Rows Using Pandas Methods
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','XYZ' ) ,
('banana', 14, 'No','BCD' ) ,
('Orange', 34, 'Yes' ,'ABC') ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print("DataFrame:")
print(df)
df_unique=df.drop_duplicates()
print("DataFrame with Unique Rows:")
print(df_unique)
Output:
DataFrame:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No XYZ
2 banana 14 No BCD
3 Orange 34 Yes ABC
DataFrame with Unique Rows:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No XYZ
2 banana 14 No BCD
DataFrame
Rows 1 and 4 of the original are identical.
You can drop_duplicates()
remove all duplicate rows from a DataFrame by using the method.
Example code Pandas method to set subset
parametersDataFrame.set_index()
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','XYZ' ) ,
('banana', 14, 'No','ABC' ) ,
('Orange', 34, 'Yes' ,'ABC') ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print("DataFrame:")
print(df)
df_unique=df.drop_duplicates(subset ="Supplier")
print("DataFrame with Unique vales of Supplier Column:")
print(df_unique)
Output:
DataFrame:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No XYZ
2 banana 14 No ABC
3 Orange 34 Yes ABC
DataFrame with Unique vales of Supplier Column:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No XYZ
This method removes all Supplier
rows in the DataFrame that do not have unique values for the column.
Here, Supplier
the column has a common value for rows 1, 3, and 4. Therefore, rows 3 and 4 will be DataFrame
deleted from ; by default, the first duplicate row will not be deleted.
Example Code: Setting keep
Parameters Pandas DataFrame.set_index()
Method
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','XYZ' ) ,
('banana', 14, 'No','ABC' ) ,
('Orange', 34, 'Yes' ,'ABC') ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print("DataFrame:")
print(df)
df_unique=df.drop_duplicates(subset ="Supplier",keep="last")
print("DataFrame with Unique vales of Supplier Column:")
print(df_unique)
Output:
DataFrame:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No XYZ
2 banana 14 No ABC
3 Orange 34 Yes ABC
DataFrame with Unique vales of Supplier Column:
Name Price In_Stock Supplier
1 Mango 24 No XYZ
3 Orange 34 Yes ABC
This method removes DataFrame
all Supplier
rows in that do not have a unique value in the column, keeping only the last duplicate row.
Here, Supplier
the column of rows 1, 3, and 4 has a common value. So rows 1 and 3 will DataFrame
be deleted from .
Example Code: Pandas Method for Setting ignore_index
ParametersDataFrame.set_index()
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','XYZ' ) ,
('banana', 14, 'No','ABC' ) ,
('Orange', 34, 'Yes' ,'ABC') ]
df = pd.DataFrame(fruit_list,
columns = ['Name',
'Price',
'In_Stock',
'Supplier'])
print("DataFrame:")
print(df)
df.drop_duplicates(subset ="Supplier",keep="last",inplace=True,ignore_index=True)
print("DataFrame with Unique vales of Supplier Column:")
print(df)
Output:
DataFrame:
Name Price In_Stock Supplier
0 Orange 34 Yes ABC
1 Mango 24 No XYZ
2 banana 14 No ABC
3 Orange 34 Yes ABC
DataFrame with Unique vales of Supplier Column:
Name Price In_Stock Supplier
0 Mango 24 No XYZ
1 Orange 34 Yes ABC
Here, because ignore_index
is set to True
, DataFrame
the index in the original is ignored and a new index is set for the row.
Due to the effect of the function, the original is modified after inplace=True
calling the function.ignore_index()
DataFrame
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Pandas DataFrame DataFrame.query() function
Publish Date:2025/04/30 Views:107 Category:Python
-
The pandas.DataFrame.query() method filters the rows of the caller DataFrame using the given query expression. pandas.DataFrame.query() grammar DataFrame . query(expr, inplace = False , ** kwargs) parameter expr Filter rows based on query e
Pandas DataFrame DataFrame.min() function
Publish Date:2025/04/30 Views:162 Category:Python
-
Python Pandas DataFrame.min() function gets the minimum value of the DataFrame object along the specified axis. pandas.DataFrame.min() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = None , ** kwargs) pa
Pandas DataFrame DataFrame.mean() function
Publish Date:2025/04/30 Views:85 Category:Python
-
Python Pandas DataFrame.mean() function calculates the mean of the values of the DataFrame object over the specified axis. pandas.DataFrame.mean() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = No
Pandas DataFrame DataFrame.isin() function
Publish Date:2025/04/30 Views:133 Category:Python
-
The pandas.DataFrame.isin(values) function checks whether each element in the caller DataFrame contains values the value specified in the input . pandas.DataFrame.isin(values) grammar DataFrame . isin(values) parameter values iterable - lis
Pandas DataFrame DataFrame.groupby() function
Publish Date:2025/04/30 Views:161 Category:Python
-
pandas.DataFrame.groupby() takes a DataFrame as input and divides the DataFrame into groups based on a given criterion. We can use groupby() the method to easily process large datasets. pandas.DataFrame.groupby() grammar DataFrame . groupby
Pandas DataFrame DataFrame.fillna() function
Publish Date:2025/04/30 Views:60 Category:Python
-
The pandas.DataFrame.fillna() function replaces the values DataFrame in NaN with a certain value. pandas.DataFrame.fillna() grammar DataFrame . fillna( value = None , method = None , axis = None , inplace = False , limit = None , down
Pandas DataFrame DataFrame.dropna() function
Publish Date:2025/04/30 Views:181 Category:Python
-
The pandas.DataFrame.dropna() function removes null values (missing values) from a DataFrame by dropping rows or columns that contain null values DataFrame . NaN ( Not a Number ) and NaT ( Not a Time ) represent null values. DataFrame
Pandas DataFrame DataFrame.assign() function
Publish Date:2025/04/30 Views:55 Category:Python
-
Python Pandas DataFrame.assign() function assigns new columns to DataFrame . pandas.DataFrame.assign() grammar DataFrame . assign( ** kwargs) parameter **kwargs Keyword arguments, DataFrame the column names to be assigned to are passed as k
Pandas DataFrame DataFrame.transform() function
Publish Date:2025/04/30 Views:120 Category:Python
-
Python Pandas DataFrame.transform() DataFrame applies a function on and transforms DataFrame . The function to be applied is passed as an argument to the function. The axis length of transform() the transformed should be the same as the ori