Pandas DataFrame DataFrame.groupby() function

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Pandas DataFrame DataFrame.groupby() function

Author：JIYIK Last Updated：2025/04/30 Views：

pandas.DataFrame.groupby() takes a DataFrame as input and divides the DataFrame into groups based on a given criterion. We can use groupby()the method to easily process large datasets.

`pandas.DataFrame.groupby()`grammar

DataFrame.groupby(
    by=None,
    axis=0,
    level=None,
    as_index=True,
    sort=True,
    group_keys=True,
    squeeze: bool=False,
    observed: bool=False)

parameter


`by`	A mapping, function, string, label, or iterable of elements
`axis`	Group by row ( `axis=0`) or column ( )`axis=1`
`level`	Integer, the value to group by a specific level
`as_index`	Boolean. It returns an object indexed by the group label.
`sort`	Boolean. It sorts the group keys
`group_keys`	Boolean. It adds a group key to the index to identify the group
`squeeze`	Boolean. When possible, it reduces the returned dimensions.
`observed`	Boolean. Only applies to any categorical grouping. If set to true `True`, only observations for the categorical grouping will be displayed.

Return Value

It returns an DataFrameGroupByobject that contains information about the group.

Example Code: `pandas.DataFrame.groupby()`Grouping Two DataFrames Based on the Values of a Single Column Using

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))

Output:

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

It groups the based on In_Stockthe values in the column DataFrameand returns a DataFrameGroupByobject.

To get detailed information about the object groupby()returned by , we can use the method of the object to get the first element of each group.DataFrameGroupByDataFrameGroupByfirst()

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34

It prints dfthe _consisting of the first elements of the two groups separated from DataFrame_.

We can also get_group()print the entire group using the method.

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))

Output:

     Name  Price In_Stock
0  Orange     34      Yes
3   Apple     44      Yes
5    Kiwi     84      Yes

It prints dfall the elements in In_Stockthat have a value of in the column Yes. We first use groubpy()the method to In_Stockdivide the elements with different values in the column into different groups, and then use get_group()the method to access a specific group.

Example Code: `pandas.DataFrame.groupby()`Grouping Two DataFrames Based on Multiple Conditions Using

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" ) ,
             ('Pineapple', 64, 'No',"XYZ") ,
             ('Kiwi', 84, 'Yes',"XYZ")  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
grouped_df = df.groupby(['In_Stock', 'Supplier']) 
  
print(grouped_df.first())

Output:

                        Name  Price
In_Stock Supplier                  
No       ABC           Mango     24
         XYZ       Pineapple     64
Yes      ABC          Orange     34
         XYZ           Apple     44

It groups the according to the values in In_Stockthe and columns and returns a object.SupplierdfDataFrameGroupBy

We use first()the method to get the first element of each group. It returns a DataFrame consisting of the first elements of the following four groups.

In_StockA group of columns Noand Suppliercolumn ABCvalues.
In_StockA group of columns Noand Suppliercolumn XYZvalues.
In_StockA group of columns Yesand Suppliercolumn ABCvalues.
In_StockA group of columns Yesand Suppliercolumn XYZvalues.

When we pass multiple tags to groupby()the function, GroupBythe returned by the method of the object DataFramehas one MultiIndex.

print(grouped_df.first().index)

Output:

MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['In_Stock', 'Supplier'])

Example code: `pandas.DataFrame.groupby()`Set in`as_index=False`

DataFrame.groupby()The parameter in the method as_indexdefaults to True. When methods first()such as are applied GroupBy, the group label is the index of the returned DataFrame.

import pandas as pd

fruit_list = [
    ("Orange", 34, "Yes"),
    ("Mango", 24, "No"),
    ("banana", 14, "No"),
    ("Apple", 44, "Yes"),
    ("Pineapple", 64, "No"),
    ("Kiwi", 84, "Yes"),
]

df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])

grouped_df = df.groupby("In_Stock", as_index=True)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

print("---------")

grouped_df = df.groupby("In_Stock", as_index=False)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34
Index(['No', 'Yes'], dtype='object', name='In_Stock')  In_Stock    Name  Price
0       No   Mango     24
1      Yes  Orange     34
Int64Index([0, 1], dtype='int64')

As you can see, DataFramethe index of the generated defaults to the group label, ie as_index=True.

When we set as_index=False, the index becomes an automatically generated numeric index.

Previous：How to Apply a Function to a Column in a Pandas Dataframe

Next：Pandas DataFrame DataFrame.isin() function

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >