Pandas DataFrame DataFrame.groupby() function
pandas.DataFrame.groupby() takes a DataFrame as input and divides the DataFrame into groups based on a given criterion. We can use groupby()
the method to easily process large datasets.
pandas.DataFrame.groupby()
grammar
DataFrame.groupby(
by=None,
axis=0,
level=None,
as_index=True,
sort=True,
group_keys=True,
squeeze: bool=False,
observed: bool=False)
parameter
by |
A mapping, function, string, label, or iterable of elements |
axis |
Group by row ( axis=0 ) or column ( )axis=1 |
level |
Integer, the value to group by a specific level |
as_index |
Boolean. It returns an object indexed by the group label. |
sort |
Boolean. It sorts the group keys |
group_keys |
Boolean. It adds a group key to the index to identify the group |
squeeze |
Boolean. When possible, it reduces the returned dimensions. |
observed |
Boolean. Only applies to any categorical grouping. If set to true True , only observations for the categorical grouping will be displayed. |
Return Value
It returns an DataFrameGroupBy
object that contains information about the group.
Example Code: pandas.DataFrame.groupby()
Grouping Two DataFrames Based on the Values of a Single Column Using
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))
Output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
It groups the based on In_Stock
the values in the column DataFrame
and returns a DataFrameGroupBy
object.
To get detailed information about the object groupby()
returned by , we can use the method of the object to get the first element of each group.DataFrameGroupBy
DataFrameGroupBy
first()
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
It prints df
the _consisting of the first elements of the two groups separated from DataFrame
_.
We can also get_group()
print the entire group using the method.
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))
Output:
Name Price In_Stock
0 Orange 34 Yes
3 Apple 44 Yes
5 Kiwi 84 Yes
It prints df
all the elements in In_Stock
that have a value of in the column Yes
. We first use groubpy()
the method to In_Stock
divide the elements with different values in the column into different groups, and then use get_group()
the method to access a specific group.
Example Code: pandas.DataFrame.groupby()
Grouping Two DataFrames Based on Multiple Conditions Using
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ,
('Pineapple', 64, 'No',"XYZ") ,
('Kiwi', 84, 'Yes',"XYZ") ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
grouped_df = df.groupby(['In_Stock', 'Supplier'])
print(grouped_df.first())
Output:
Name Price
In_Stock Supplier
No ABC Mango 24
XYZ Pineapple 64
Yes ABC Orange 34
XYZ Apple 44
It groups the according to the values in In_Stock
the and columns and returns a object.Supplier
df
DataFrameGroupBy
We use first()
the method to get the first element of each group. It returns a DataFrame consisting of the first elements of the following four groups.
In_Stock
A group of columnsNo
andSupplier
columnABC
values.In_Stock
A group of columnsNo
andSupplier
columnXYZ
values.In_Stock
A group of columnsYes
andSupplier
columnABC
values.In_Stock
A group of columnsYes
andSupplier
columnXYZ
values.
When we pass multiple tags to groupby()
the function, GroupBy
the returned by the method of the object DataFrame
has one MultiIndex
.
print(grouped_df.first().index)
Output:
MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['In_Stock', 'Supplier'])
Example code: pandas.DataFrame.groupby()
Set inas_index=False
DataFrame.groupby()
The parameter in the method as_index
defaults to True
. When methods first()
such as are applied GroupBy
, the group label is the index of the returned DataFrame
.
import pandas as pd
fruit_list = [
("Orange", 34, "Yes"),
("Mango", 24, "No"),
("banana", 14, "No"),
("Apple", 44, "Yes"),
("Pineapple", 64, "No"),
("Kiwi", 84, "Yes"),
]
df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])
grouped_df = df.groupby("In_Stock", as_index=True)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
print("---------")
grouped_df = df.groupby("In_Stock", as_index=False)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
Index(['No', 'Yes'], dtype='object', name='In_Stock') In_Stock Name Price
0 No Mango 24
1 Yes Orange 34
Int64Index([0, 1], dtype='int64')
As you can see, DataFrame
the index of the generated defaults to the group label, ie as_index=True
.
When we set as_index=False
, the index becomes an automatically generated numeric index.
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Pandas DataFrame DataFrame.query() function
Publish Date:2025/04/30 Views:107 Category:Python
-
The pandas.DataFrame.query() method filters the rows of the caller DataFrame using the given query expression. pandas.DataFrame.query() grammar DataFrame . query(expr, inplace = False , ** kwargs) parameter expr Filter rows based on query e
Pandas DataFrame DataFrame.min() function
Publish Date:2025/04/30 Views:162 Category:Python
-
Python Pandas DataFrame.min() function gets the minimum value of the DataFrame object along the specified axis. pandas.DataFrame.min() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = None , ** kwargs) pa
Pandas DataFrame DataFrame.mean() function
Publish Date:2025/04/30 Views:85 Category:Python
-
Python Pandas DataFrame.mean() function calculates the mean of the values of the DataFrame object over the specified axis. pandas.DataFrame.mean() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = No
Pandas DataFrame DataFrame.isin() function
Publish Date:2025/04/30 Views:133 Category:Python
-
The pandas.DataFrame.isin(values) function checks whether each element in the caller DataFrame contains values the value specified in the input . pandas.DataFrame.isin(values) grammar DataFrame . isin(values) parameter values iterable - lis
How to Apply a Function to a Column in a Pandas Dataframe
Publish Date:2025/04/30 Views:50 Category:Python
-
In Pandas, you can transform and manipulate columns and DataFrames using methods apply() such as transform() and . The desired transformation is passed to these methods as a function argument. Each method has its own subtle differences and
Finding the installed version of Pandas
Publish Date:2025/04/12 Views:190 Category:Python
-
Pandas is one of the commonly used Python libraries for data analysis, and Pandas versions need to be updated regularly. Therefore, other Pandas requirements are incompatible. Let's look at ways to determine the Pandas version and dependenc
KeyError in Pandas
Publish Date:2025/04/12 Views:81 Category:Python
-
This tutorial explores the concept of KeyError in Pandas. What is Pandas KeyError? While working with Pandas, analysts may encounter multiple errors thrown by the code interpreter. These errors are wide ranging and can help us better invest
Grouping and Sorting in Pandas
Publish Date:2025/04/12 Views:90 Category:Python
-
This tutorial explored the concept of grouping data in a DataFrame and sorting it in Pandas. Grouping and Sorting DataFrame in Pandas As we know, Pandas is an advanced data analysis tool or package extension in Python. Most of the companies
Plotting Line Graph with Data Points in Pandas
Publish Date:2025/04/12 Views:65 Category:Python
-
Pandas is an open source data analysis library in Python. It provides many built-in methods to perform operations on numerical data. Data visualization is very popular nowadays and is used to quickly analyze data visually. We can visualize