JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

Pandas DataFrame DataFrame.groupby() function

Author:JIYIK Last Updated:2025/04/30 Views:

pandas.DataFrame.groupby() takes a DataFrame as input and divides the DataFrame into groups based on a given criterion. We can use groupby()the method to easily process large datasets.


pandas.DataFrame.groupby()grammar

DataFrame.groupby(
    by=None,
    axis=0,
    level=None,
    as_index=True,
    sort=True,
    group_keys=True,
    squeeze: bool=False,
    observed: bool=False)

parameter

by A mapping, function, string, label, or iterable of elements
axis Group by row ( axis=0) or column ( )axis=1
level Integer, the value to group by a specific level
as_index Boolean. It returns an object indexed by the group label.
sort Boolean. It sorts the group keys
group_keys Boolean. It adds a group key to the index to identify the group
squeeze Boolean. When possible, it reduces the returned dimensions.
observed Boolean. Only applies to any categorical grouping. If set to true True, only observations for the categorical grouping will be displayed.

Return Value

It returns an DataFrameGroupByobject that contains information about the group.


Example Code: pandas.DataFrame.groupby()Grouping Two DataFrames Based on the Values ​​of a Single Column Using

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))

Output:

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

It groups the based on In_Stockthe values ​​in the column DataFrameand returns a DataFrameGroupByobject.

To get detailed information about the object groupby()returned by , we can use the method of the object to get the first element of each group.DataFrameGroupByDataFrameGroupByfirst()

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34

It prints dfthe _consisting of the first elements of the two groups separated from DataFrame_.

We can also get_group()print the entire group using the method.

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))

Output:

     Name  Price In_Stock
0  Orange     34      Yes
3   Apple     44      Yes
5    Kiwi     84      Yes

It prints dfall the elements in In_Stockthat have a value of in the column Yes. We first use groubpy()the method to In_Stockdivide the elements with different values ​​in the column into different groups, and then use get_group()the method to access a specific group.


Example Code: pandas.DataFrame.groupby()Grouping Two DataFrames Based on Multiple Conditions Using

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" ) ,
             ('Pineapple', 64, 'No',"XYZ") ,
             ('Kiwi', 84, 'Yes',"XYZ")  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
grouped_df = df.groupby(['In_Stock', 'Supplier']) 
  
print(grouped_df.first())

Output:

                        Name  Price
In_Stock Supplier                  
No       ABC           Mango     24
         XYZ       Pineapple     64
Yes      ABC          Orange     34
         XYZ           Apple     44

It groups the according to the values ​​in In_Stockthe and columns and returns a object.SupplierdfDataFrameGroupBy

We use first()the method to get the first element of each group. It returns a DataFrame consisting of the first elements of the following four groups.

  • In_StockA group of columns Noand Suppliercolumn ABCvalues.
  • In_StockA group of columns Noand Suppliercolumn XYZvalues.
  • In_StockA group of columns Yesand Suppliercolumn ABCvalues.
  • In_StockA group of columns Yesand Suppliercolumn XYZvalues.

When we pass multiple tags to groupby()the function, GroupBythe returned by the method of the object DataFramehas one MultiIndex.

print(grouped_df.first().index)

Output:

MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['In_Stock', 'Supplier'])

Example code: pandas.DataFrame.groupby()Set inas_index=False

DataFrame.groupby()The parameter in the method as_indexdefaults to True. When methods first()such as are applied GroupBy, the group label is the index of the returned DataFrame.

import pandas as pd

fruit_list = [
    ("Orange", 34, "Yes"),
    ("Mango", 24, "No"),
    ("banana", 14, "No"),
    ("Apple", 44, "Yes"),
    ("Pineapple", 64, "No"),
    ("Kiwi", 84, "Yes"),
]

df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])

grouped_df = df.groupby("In_Stock", as_index=True)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

print("---------")

grouped_df = df.groupby("In_Stock", as_index=False)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34
Index(['No', 'Yes'], dtype='object', name='In_Stock')  In_Stock    Name  Price
0       No   Mango     24
1      Yes  Orange     34
Int64Index([0, 1], dtype='int64')

As you can see, DataFramethe index of the generated defaults to the group label, ie as_index=True.

When we set as_index=False, the index becomes an automatically generated numeric index.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Pandas DataFrame DataFrame.query() function

Publish Date:2025/04/30 Views:107 Category:Python

The pandas.DataFrame.query() method filters the rows of the caller DataFrame using the given query expression. pandas.DataFrame.query() grammar DataFrame . query(expr, inplace = False , ** kwargs) parameter expr Filter rows based on query e

Pandas DataFrame DataFrame.min() function

Publish Date:2025/04/30 Views:162 Category:Python

Python Pandas DataFrame.min() function gets the minimum value of the DataFrame object along the specified axis. pandas.DataFrame.min() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = None , ** kwargs) pa

Pandas DataFrame DataFrame.mean() function

Publish Date:2025/04/30 Views:85 Category:Python

Python Pandas DataFrame.mean() function calculates the mean of the values ​​of the DataFrame object over the specified axis. pandas.DataFrame.mean() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = No

Pandas DataFrame DataFrame.isin() function

Publish Date:2025/04/30 Views:133 Category:Python

The pandas.DataFrame.isin(values) function checks whether each element in the caller DataFrame contains values the value specified in the input . pandas.DataFrame.isin(values) grammar DataFrame . isin(values) parameter values iterable - lis

How to Apply a Function to a Column in a Pandas Dataframe

Publish Date:2025/04/30 Views:50 Category:Python

In Pandas, you can transform and manipulate columns and DataFrames using methods apply() such as transform() and . The desired transformation is passed to these methods as a function argument. Each method has its own subtle differences and

Finding the installed version of Pandas

Publish Date:2025/04/12 Views:190 Category:Python

Pandas is one of the commonly used Python libraries for data analysis, and Pandas versions need to be updated regularly. Therefore, other Pandas requirements are incompatible. Let's look at ways to determine the Pandas version and dependenc

KeyError in Pandas

Publish Date:2025/04/12 Views:81 Category:Python

This tutorial explores the concept of KeyError in Pandas. What is Pandas KeyError? While working with Pandas, analysts may encounter multiple errors thrown by the code interpreter. These errors are wide ranging and can help us better invest

Grouping and Sorting in Pandas

Publish Date:2025/04/12 Views:90 Category:Python

This tutorial explored the concept of grouping data in a DataFrame and sorting it in Pandas. Grouping and Sorting DataFrame in Pandas As we know, Pandas is an advanced data analysis tool or package extension in Python. Most of the companies

Plotting Line Graph with Data Points in Pandas

Publish Date:2025/04/12 Views:65 Category:Python

Pandas is an open source data analysis library in Python. It provides many built-in methods to perform operations on numerical data. Data visualization is very popular nowadays and is used to quickly analyze data visually. We can visualize

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial