JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

How to filter DataFrame rows based on column values in Pandas

Author:JIYIK Last Updated:2025/05/03 Views:

We can select rows based on single or multiple column values DataFrame. We can also get rows from a DataFrame that satisfy or do not satisfy one or more conditions. This can be achieved using Boolean indexing, positional indexing, label indexing, and the query() method.


Select Pandas rows based on specific column values

We can select Pandas rows from a DataFrame that contain or do not contain a specific value of a column. It is widely used to filter a DataFrame based on a column value.

Select Pandas rows containing specific column values

In boolean indexing, we first generate a mask, which is simply a series of boolean values ​​that represent whether the column contains a particular element.

df_mask = df["col_name"] == "specific_value"

We then apply this mask to the original DataFrame to filter the desired values.

filtered_df = df[df_mask]

This returns a filtered DataFrame that DataFramecontains only rows with values ​​for col_namethe column .specific_value

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] == 300
filtered_df = df[df_mask]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

This will give dfall rows in which sales is 300.

It is similar to boolean indexing but does one more step. In this method, we first create a boolean mask and then find the positions where the boolean mask has true values. We then Truepass all the positions in the mask where the value is true to iloc()the method in order to select only all the required rows.

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] == 300
positions = np.flatnonzero(df_mask)
filtered_df = df.iloc[positions]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

This will also give dfall rows in which the sales value is 300.

We can also use Pandas Chaining to filter by column values pandas.DataFrame. In this method, we use pandas.DataFrame.eq() method, whose values ​​will be checked to compare element-wise equality in the DataFrame.

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

filtered_df = df[df.Sales.eq(300)]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

We can select Pandas rows by column values ​​in a column using pandas.DataFrame.query().

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

filtered_df = df.query("Sales == 300")
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

If we wish to update an existing DataFrame, we can do so in the query method inplace=True.


Select Pandas rows that do not contain a specific column value

The method of selecting Pandas rows that do not contain a specific column value is similar to the method of selecting Pandas rows with a specific column value. The only thing we need to change is the condition, that is, when creating the mask or query, just ==replace with !=for the column not to contain a specific value.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] != 300
filtered_df = df[df_mask]
print(filtered_df)

Output:

       Date  Sales  Price
0  April-10    200      3
2  April-12    400      2
3  April-13    200      4

This will select 300all dfrows where the Sales value is not .


Select Pandas rows where column value is greater or less than a specific value

To select Pandas rows where column values ​​are greater than or less than a specific value, use operators such as >, <=, when creating a mask or query .>=

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] >= 300
filtered_df = df[df_mask]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
2  April-12    400      2
4  April-14    300      3
5  April-16    300      2

This will result in a DataFrame Saleswith values ​​greater than or equal to 300.


Select Pandas rows based on multiple column values

We have introduced methods to select rows based on specific values ​​of columns in DataFrame. In this section, we will discuss methods to select rows in Pandas based on multiple column values.

Select Pandas rows containing one of multiple column values

To select Pandas rows that contain one of multiple column values, we use pandas.DataFrame.isin(values), which returns a DataFrame of Boolean values ​​representing whether each element in the DataFrame is contained in values. The DataFrame of Boolean values ​​obtained in this way can be used to select rows.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

values = [200, 400]
filtered_df = df[df.Sales.isin(values)]
print(filtered_df)

Output:

       Date  Sales  Price
0  April-10    200      3
2  April-12    400      2
3  April-13    200      4

It filters out all rows in the DataFrame Saleswhere the value of is 200or .400

Select Pandas rows that do not contain one of multiple specified column values

To select the rows of the DataFrame that do not contain any of multiple specified column values, we will negate ~the value returned from pandas.DataFrame.isin(values) by placing the - sign booleansin front.DataFrame

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

values = [200, 400]
filtered_df = df[~df.Sales.isin(values)]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

It filters all rows from the DataFrame whose sales value is neither 200 nor 400.


Select DataFrame rows with multiple conditions

If we want to filter rows considering row values ​​of multiple columns, we create multiple conditions and &combine them with the operator. Now, the row is selected only if the conditions of all the columns are met.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

values_1 = [200, 400]
values_2 = [2, 3]
filtered_df = df[df.Sales.isin(values_1) & ~df.Price.isin(values_2)]
print(filtered_df)

Output:

       Date  Sales  Price
3  April-13    200      4

It filters all rows from the DataFrame where sales value is 200or 400and price is 2or 3. The rows in the output satisfy only those two conditions in the entire DataFrame.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Convert Tensor to NumPy array in Python

Publish Date:2025/05/03 Views:85 Category:Python

This tutorial will show you how to convert a Tensor to a NumPy array in Python. Use the function in Python Tensor.numpy() to convert a tensor to a NumPy array Eager Execution of TensorFlow library can be used to convert tensor to NumPy arra

Saving NumPy arrays as images in Python

Publish Date:2025/05/03 Views:193 Category:Python

In Python, numpy module is used to manipulate arrays. There are many modules available in Python that allow us to read and store images. An image can be thought of as an array of different pixels stored at specific locations with correspond

Transposing a 1D array in NumPy

Publish Date:2025/05/03 Views:98 Category:Python

Arrays and matrices form the core of this Python library. The transpose of these arrays and matrices plays a vital role in certain topics such as machine learning. In NumPy, it is easy to calculate the transpose of an array or a matrix. Tra

Find the first index of an element in a NumPy array

Publish Date:2025/05/03 Views:58 Category:Python

In this tutorial, we will discuss how to find the first index of an element in a numpy array. Use where() the function to find the first index of an element in a NumPy array The function in the numpy module where() is used to return an arra

Remove Nan values from NumPy array

Publish Date:2025/05/03 Views:118 Category:Python

This article discusses some built-in NumPy functions that you can use to remove nan values. Remove Nan values ​​using logical_not() and methods in NumPy isnan() logical_not() is used to apply logical NOT to the elements of an array. isn

Normalizing a vector in Python

Publish Date:2025/05/03 Views:51 Category:Python

A common concept in the field of machine learning is to normalize a vector or dataset before passing it to the algorithm. When we talk about normalizing a vector, we say that its vector magnitude is 1, being a unit vector. In this tutorial,

Calculating Euclidean distance in Python

Publish Date:2025/05/03 Views:128 Category:Python

In the world of mathematics, the shortest distance between two points in any dimension is called the Euclidean distance. It is the square root of the sum of the squares of the differences between the two points. In Python, the numpy, scipy

Element-wise division in Python NumPy

Publish Date:2025/05/03 Views:199 Category:Python

This tutorial shows you how to perform element-wise division on NumPy arrays in Python. NumPy Element-Wise Division using numpy.divide() the function If we have two arrays and want to divide each element of the first array with each element

Convert 3D array to 2D array in Python

Publish Date:2025/05/03 Views:79 Category:Python

In this tutorial, we will discuss the methods to convert 3D array to 2D array in Python. numpy.reshape() Convert 3D array to 2D array using function in Python [ numpy.reshape() Function](numpy.reshape - NumPy v1.20 manual)Changes the shape

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial