How to create DataFrame columns based on a given condition in Pandas
We can use list comprehensions of DataFrame objects, NumPy methods, apply()
and map()
methods to create columns based on a given condition in Pandas DataFrame
.
DataFrame
List comprehension to create new columns based on given conditions in Pandas
We can make use of various list comprehensions to create new columns based on given conditions in Pandas DataFrame
. List comprehension is a method of creating new lists from iterable objects. It is faster and simpler than other methods.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = ["Senior" if s >= 400 else "Junior" for s in df["Salary"]]
print(df)
Output:
Name Joined date Salary Status
0 Hisila 2019-11-20 200 Junior
1 Shristi 2020-01-02 400 Senior
2 Zeppy 2020-02-05 300 Junior
3 Alina 2020-03-10 500 Senior
4 Jerry 2020-04-16 600 Senior
5 Kevin 2020-05-01 300 Junior
If Salary
is greater than or equal to 400, it will df
create a new column in Status
with the value of Senior
, otherwise Junior
.
NumPy method to create new DataFrame columns based on given conditions in Pandas
We can also use the NumPy method to create a column based on a given condition in Pandas DataFrame
. For this, we can use np.where()
the method and np.select()
the method.
np.where()
method
np.where()
Takes a condition as input and returns the indices of the elements that satisfy the given condition. This method can be used to create a DataFrame column based on a given condition in Pandas when we have only one condition.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = np.where(df["Salary"] >= 400, "Senior", "Junior")
print(df)
Output:
Name Joined date Salary Status
0 Hisila 2019-11-20 200 Junior
1 Shristi 2020-01-02 400 Senior
2 Zeppy 2020-02-05 300 Junior
3 Alina 2020-03-10 500 Senior
4 Jerry 2020-04-16 600 Senior
5 Kevin 2020-05-01 300 Junior
If the condition is met, then np.where(condition, x, y)
return x, otherwise return y.
The above code will df
create a new column in Status
with the value of primary if the given condition is met Senior
. Otherwise, it will set the value to primary.
np.select()
method
np.where() takes a condition list and a selection list as input and returns an array built from the elements in the selection list based on the condition. When we have two or more conditions, we can use this method to create a DataFrame column based on the given condition in Pandas.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
conditionlist = [
(df["Salary"] >= 500),
(df["Salary"] >= 300) & (df["Salary"] < 300),
(df["Salary"] <= 300),
]
choicelist = ["High", "Mid", "Low"]
df["Salary_Range"] = np.select(conditionlist, choicelist, default="Not Specified")
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 black
2 Zeppy 2020-02-05 300 Low
3 Alina 2020-03-10 500 High
4 Jerry 2020-04-16 600 High
5 Kevin 2020-05-01 300 Low
Here, if the first condition in the condition list is satisfied for a row, then Salary_Range
the value of the column of that particular row will be set to the first element in the selection list. Similar is true for the other conditions in the condition list. If any condition in the condition list is not satisfied, then Salary_Range
the value of the column of that row will be set to np.where()
the value of the default parameter in the method, for example, Not Specified
.
pandas.DataFrame.apply
Create new DataFrame columns based on given conditions in Pandas
pandas.DataFrame.apply returns a DataFrame
with the results of applying a given function along a given axis of the DataFrame.
grammar:
DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)
func
Represents the function to be applied.
axis
represents the axis to which the function is applied. We can apply the function to each row using axis=1
or .axis = 'columns'
We can use this method to check the condition and set the value for each row of the new column.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
def set_values(row, value):
return value[row]
map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}
df["Salary_Range"] = df["Salary"].apply(set_values, args=(map_dictionary,))
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 MID
2 Zeppy 2020-02-05 300 LOW
3 Alina 2020-03-10 500 HIGH
4 Jerry 2020-04-16 600 HIGH
5 Kevin 2020-05-01 300 LOW
Here, we define a function set_values()
that df.apply()
is applied to each row using . The function Salary
sets the value of the column for Salary_Range
each row based on the value of the column for that row. We set up a map_dictionary
to Salary
determine the value of the column based on the data in the column Salary_Range
. This approach gives us more flexibility when there are many options for the new column.
pandas.Series.map()
Create new DataFrame columns based on given conditions in Pandas
We can also create new DataFrame
columns based on a given condition in Pandas using pandas.Series.map(). This method works element-wise on the Series and maps the values from one column to another based on the input which may be a dictionary, function or Series.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}
df["Salary_Range"] = df["Salary"].map(map_dictionary)
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 MID
2 Zeppy 2020-02-05 300 LOW
3 Alina 2020-03-10 500 HIGH
4 Jerry 2020-04-16 600 HIGH
5 Kevin 2020-05-01 300 LOW
It creates a new column Salary_Range
and map_dictionary
sets the value of each row of that column based on the key-value pairs in .
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Convert Pandas to CSV without index
Publish Date:2025/05/01 Views:159 Category:Python
-
As you know, an index can be thought of as a reference point used to store and access records in a DataFrame. They are unique for each row and usually range from 0 to the last row of the DataFrame, but we can also have serial numbers, dates
Convert Pandas DataFrame to Dictionary
Publish Date:2025/05/01 Views:197 Category:Python
-
This tutorial will show you how to convert a Pandas DataFrame into a dictionary with the index column elements as keys and the corresponding elements of other columns as values. We will use the following DataFrame in the article. import pan
Convert Pandas DataFrame columns to lists
Publish Date:2025/05/01 Views:191 Category:Python
-
When working with Pandas DataFrames in Python, you often need to convert the columns of the DataFrame into Python lists. This process is very important for various data manipulation and analysis tasks. Fortunately, Pandas provides several m
Subtracting Two Columns in Pandas DataFrame
Publish Date:2025/05/01 Views:120 Category:Python
-
Pandas can handle very large data sets and has a variety of functions and operations that can be applied to the data. One of the simple operations is to subtract two columns and store the result in a new column, which we will discuss in thi
Dropping columns by index in Pandas DataFrame
Publish Date:2025/05/01 Views:99 Category:Python
-
DataFrames can be very large and can contain hundreds of rows and columns. It is necessary to master the basic maintenance operations of DataFrames, such as deleting multiple columns. We can use dataframe.drop() the method to delete columns
Pandas Copy DataFrame
Publish Date:2025/05/01 Views:53 Category:Python
-
This tutorial will show you how to DataFrame.copy() copy a DataFrame object using the copy method. import pandas as pd items_df = pd . DataFrame( { "Id" : [ 302 , 504 , 708 ], "Cost" : [ "300" , "400" , "350" ], } ) print (items_df) Output:
Pandas DataFrame.ix[] Function
Publish Date:2025/05/01 Views:168 Category:Python
-
Python Pandas DataFrame.ix[] function slices rows or columns based on the value of the argument. pandas.DataFrame.ix[] grammar DataFrame . ix[index = None , label = None ] parameter index Integer or list of integers used to slice row indice
Pandas DataFrame.describe() Function
Publish Date:2025/05/01 Views:120 Category:Python
-
Python Pandas DataFrame.describe() function returns the statistics of a DataFrame. pandas.DataFrame.describe() grammar DataFrame . describe( percentiles = None , include = None , exclude = None , datetime_is_numeric = False ) parameter perc
Pandas DataFrame.astype() Function
Publish Date:2025/05/01 Views:160 Category:Python
-
Python Pandas DataFrame.astype() function changes the data type of an object to the specified data type. pandas.DataFrame.astype() grammar DataFrame . astype(dtype, copy = True , errors = "raise" ) parameter dtype The data type we want to a