Pandas groupby by two columns
This tutorial has shown how to use the method in Pandas DataFrame.groupby()
to split a two-column DataFrame into several groups. We can also get more information from the created groups.
We will use the following DataFrame in this article.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Pandas Groupby Multiple Columns
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(["Gender", "Employed"])
for group_key, group_value in groups:
group = groups.get_group(group_key)
print(group)
print("")
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Groups in DataFrame:
Name Gender Employed Age
3 Emma Female No 24
Name Gender Employed Age
0 Jennifer Female Yes 30
4 Luna Female Yes 28
Name Gender Employed Age
1 Travis Male No 28
5 Anish Male No 25
Name Gender Employed Age
2 Bob Male Yes 27
It creates 4 groups from the DataFrame. All rows Gender
with Employed
the same values of and columns are placed in the same group.
Count number of rows per group Pandas
DataFrame.groupby()
To count the number of rows for each group created using the method, we can use size()
the method.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(["Gender", "Employed"]).size().reset_index(name="Count")
print(grouped_df)
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Count of Each group:
Gender Employed Count
0 Female No 1
1 Female Yes 2
2 Male No 2
3 Male Yes 1
It displays the DataFrame, the groups created from the DataFrame, and the number of elements in each group.
If we want to get Employed
the maximum count of each value in the column, we can form another group from the groups created above and count the values, and then use max()
the method to get the maximum value of the count.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
groups = data.groupby(["Gender", "Employed"]).size().groupby(level=1)
print(groups.max())
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Employed
No 2
Yes 2
dtype: int64
It shows the maximum count of column values in the groups created from Gender
the and columns.Employed
Employed
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Pandas DataFrame DataFrame.query() function
Publish Date:2025/04/30 Views:107 Category:Python
-
The pandas.DataFrame.query() method filters the rows of the caller DataFrame using the given query expression. pandas.DataFrame.query() grammar DataFrame . query(expr, inplace = False , ** kwargs) parameter expr Filter rows based on query e
Pandas DataFrame DataFrame.min() function
Publish Date:2025/04/30 Views:162 Category:Python
-
Python Pandas DataFrame.min() function gets the minimum value of the DataFrame object along the specified axis. pandas.DataFrame.min() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = None , ** kwargs) pa
Pandas DataFrame DataFrame.mean() function
Publish Date:2025/04/30 Views:85 Category:Python
-
Python Pandas DataFrame.mean() function calculates the mean of the values of the DataFrame object over the specified axis. pandas.DataFrame.mean() grammar DataFrame . mean(axis = None , skipna = None , level = None , numeric_only = No
Pandas DataFrame DataFrame.isin() function
Publish Date:2025/04/30 Views:133 Category:Python
-
The pandas.DataFrame.isin(values) function checks whether each element in the caller DataFrame contains values the value specified in the input . pandas.DataFrame.isin(values) grammar DataFrame . isin(values) parameter values iterable - lis
Pandas DataFrame DataFrame.groupby() function
Publish Date:2025/04/30 Views:161 Category:Python
-
pandas.DataFrame.groupby() takes a DataFrame as input and divides the DataFrame into groups based on a given criterion. We can use groupby() the method to easily process large datasets. pandas.DataFrame.groupby() grammar DataFrame . groupby
Pandas DataFrame DataFrame.fillna() function
Publish Date:2025/04/30 Views:60 Category:Python
-
The pandas.DataFrame.fillna() function replaces the values DataFrame in NaN with a certain value. pandas.DataFrame.fillna() grammar DataFrame . fillna( value = None , method = None , axis = None , inplace = False , limit = None , down
Pandas DataFrame DataFrame.dropna() function
Publish Date:2025/04/30 Views:181 Category:Python
-
The pandas.DataFrame.dropna() function removes null values (missing values) from a DataFrame by dropping rows or columns that contain null values DataFrame . NaN ( Not a Number ) and NaT ( Not a Time ) represent null values. DataFrame
Pandas DataFrame DataFrame.assign() function
Publish Date:2025/04/30 Views:55 Category:Python
-
Python Pandas DataFrame.assign() function assigns new columns to DataFrame . pandas.DataFrame.assign() grammar DataFrame . assign( ** kwargs) parameter **kwargs Keyword arguments, DataFrame the column names to be assigned to are passed as k
Pandas DataFrame DataFrame.transform() function
Publish Date:2025/04/30 Views:120 Category:Python
-
Python Pandas DataFrame.transform() DataFrame applies a function on and transforms DataFrame . The function to be applied is passed as an argument to the function. The axis length of transform() the transformed should be the same as the ori