Pandas DataFrame DataFrame.groupby() function
pandas.DataFrame.groupby() takes a DataFrame as input and divides the DataFrame into groups based on a given criterion. We can use groupby()the method to easily process large datasets.
pandas.DataFrame.groupby()grammar
DataFrame.groupby(
by=None,
axis=0,
level=None,
as_index=True,
sort=True,
group_keys=True,
squeeze: bool=False,
observed: bool=False)
parameter
by |
A mapping, function, string, label, or iterable of elements |
axis |
Group by row ( axis=0) or column ( )axis=1 |
level |
Integer, the value to group by a specific level |
as_index |
Boolean. It returns an object indexed by the group label. |
sort |
Boolean. It sorts the group keys |
group_keys |
Boolean. It adds a group key to the index to identify the group |
squeeze |
Boolean. When possible, it reduces the returned dimensions. |
observed |
Boolean. Only applies to any categorical grouping. If set to true True, only observations for the categorical grouping will be displayed. |
Return Value
It returns an DataFrameGroupByobject that contains information about the group.
Example Code: pandas.DataFrame.groupby()Grouping Two DataFrames Based on the Values of a Single Column Using
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))
Output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
It groups the based on In_Stockthe values in the column DataFrameand returns a DataFrameGroupByobject.
To get detailed information about the object groupby()returned by , we can use the method of the object to get the first element of each group.DataFrameGroupByDataFrameGroupByfirst()
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
It prints dfthe _consisting of the first elements of the two groups separated from DataFrame_.
We can also get_group()print the entire group using the method.
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))
Output:
Name Price In_Stock
0 Orange 34 Yes
3 Apple 44 Yes
5 Kiwi 84 Yes
It prints dfall the elements in In_Stockthat have a value of in the column Yes. We first use groubpy()the method to In_Stockdivide the elements with different values in the column into different groups, and then use get_group()the method to access a specific group.
Example Code: pandas.DataFrame.groupby()Grouping Two DataFrames Based on Multiple Conditions Using
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ,
('Pineapple', 64, 'No',"XYZ") ,
('Kiwi', 84, 'Yes',"XYZ") ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
grouped_df = df.groupby(['In_Stock', 'Supplier'])
print(grouped_df.first())
Output:
Name Price
In_Stock Supplier
No ABC Mango 24
XYZ Pineapple 64
Yes ABC Orange 34
XYZ Apple 44
It groups the according to the values in In_Stockthe and columns and returns a object.SupplierdfDataFrameGroupBy
We use first()the method to get the first element of each group. It returns a DataFrame consisting of the first elements of the following four groups.
In_StockA group of columnsNoandSuppliercolumnABCvalues.In_StockA group of columnsNoandSuppliercolumnXYZvalues.In_StockA group of columnsYesandSuppliercolumnABCvalues.In_StockA group of columnsYesandSuppliercolumnXYZvalues.
When we pass multiple tags to groupby()the function, GroupBythe returned by the method of the object DataFramehas one MultiIndex.
print(grouped_df.first().index)
Output:
MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['In_Stock', 'Supplier'])
Example code: pandas.DataFrame.groupby()Set inas_index=False
DataFrame.groupby()The parameter in the method as_indexdefaults to True. When methods first()such as are applied GroupBy, the group label is the index of the returned DataFrame.
import pandas as pd
fruit_list = [
("Orange", 34, "Yes"),
("Mango", 24, "No"),
("banana", 14, "No"),
("Apple", 44, "Yes"),
("Pineapple", 64, "No"),
("Kiwi", 84, "Yes"),
]
df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])
grouped_df = df.groupby("In_Stock", as_index=True)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
print("---------")
grouped_df = df.groupby("In_Stock", as_index=False)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
Index(['No', 'Yes'], dtype='object', name='In_Stock') In_Stock Name Price
0 No Mango 24
1 Yes Orange 34
Int64Index([0, 1], dtype='int64')
As you can see, DataFramethe index of the generated defaults to the group label, ie as_index=True.
When we set as_index=False, the index becomes an automatically generated numeric index.
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Pandas DataFrame DataFrame.query() function
Publish Date:2025/04/30 Views:107 Category:Python
-
The pandas.DataFrame.query() method filters the rows of the caller DataFrame using the given query expression. pandas.DataFrame.query() grammar DataFrame . query(expr, inplace = False , ** kwargs) parameter expr Filter rows based on query e
Pandas DataFrame DataFrame.min() function
Publish Date:2025/04/30 Views:162 Category:Python
-
Python Pandas DataFrame.min() function gets the minimum value of the DataFrame object along the specified axis. pandas.DataFrame.min() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = None , ** kwargs) pa
Pandas DataFrame DataFrame.mean() function
Publish Date:2025/04/30 Views:85 Category:Python
-
Python Pandas DataFrame.mean() function calculates the mean of the values of the DataFrame object over the specified axis. pandas.DataFrame.mean() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = No
Pandas DataFrame DataFrame.isin() function
Publish Date:2025/04/30 Views:133 Category:Python
-
The pandas.DataFrame.isin(values) function checks whether each element in the caller DataFrame contains values the value specified in the input . pandas.DataFrame.isin(values) grammar DataFrame . isin(values) parameter values iterable - lis
How to Apply a Function to a Column in a Pandas Dataframe
Publish Date:2025/04/30 Views:50 Category:Python
-
In Pandas, you can transform and manipulate columns and DataFrames using methods apply() such as transform() and . The desired transformation is passed to these methods as a function argument. Each method has its own subtle differences and
Finding the installed version of Pandas
Publish Date:2025/04/12 Views:190 Category:Python
-
Pandas is one of the commonly used Python libraries for data analysis, and Pandas versions need to be updated regularly. Therefore, other Pandas requirements are incompatible. Let's look at ways to determine the Pandas version and dependenc
KeyError in Pandas
Publish Date:2025/04/12 Views:81 Category:Python
-
This tutorial explores the concept of KeyError in Pandas. What is Pandas KeyError? While working with Pandas, analysts may encounter multiple errors thrown by the code interpreter. These errors are wide ranging and can help us better invest
Grouping and Sorting in Pandas
Publish Date:2025/04/12 Views:90 Category:Python
-
This tutorial explored the concept of grouping data in a DataFrame and sorting it in Pandas. Grouping and Sorting DataFrame in Pandas As we know, Pandas is an advanced data analysis tool or package extension in Python. Most of the companies
Plotting Line Graph with Data Points in Pandas
Publish Date:2025/04/12 Views:65 Category:Python
-
Pandas is an open source data analysis library in Python. It provides many built-in methods to perform operations on numerical data. Data visualization is very popular nowadays and is used to quickly analyze data visually. We can visualize

