How to create DataFrame columns based on a given condition in Pandas

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

How to create DataFrame columns based on a given condition in Pandas

Author：JIYIK Last Updated：2025/05/01 Views：

We can use list comprehensions of DataFrame objects, NumPy methods, apply()and map()methods to create columns based on a given condition in Pandas DataFrame.

`DataFrame`List comprehension to create new columns based on given conditions in Pandas

We can make use of various list comprehensions to create new columns based on given conditions in Pandas DataFrame. List comprehension is a method of creating new lists from iterable objects. It is faster and simpler than other methods.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = ["Senior" if s >= 400 else "Junior" for s in df["Salary"]]
print(df)

Output:

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

If Salaryis greater than or equal to 400, it will dfcreate a new column in Statuswith the value of Senior, otherwise Junior.

NumPy method to create new DataFrame columns based on given conditions in Pandas

We can also use the NumPy method to create a column based on a given condition in Pandas DataFrame. For this, we can use np.where()the method and np.select()the method.

`np.where()`method

np.where()Takes a condition as input and returns the indices of the elements that satisfy the given condition. This method can be used to create a DataFrame column based on a given condition in Pandas when we have only one condition.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

df["Status"] = np.where(df["Salary"] >= 400, "Senior", "Junior")
print(df)

Output:

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

If the condition is met, then np.where(condition, x, y)return x, otherwise return y.

The above code will dfcreate a new column in Statuswith the value of primary if the given condition is met Senior. Otherwise, it will set the value to primary.

`np.select()`method

np.where() takes a condition list and a selection list as input and returns an array built from the elements in the selection list based on the condition. When we have two or more conditions, we can use this method to create a DataFrame column based on the given condition in Pandas.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

conditionlist = [
    (df["Salary"] >= 500),
    (df["Salary"] >= 300) & (df["Salary"] < 300),
    (df["Salary"] <= 300),
]
choicelist = ["High", "Mid", "Low"]
df["Salary_Range"] = np.select(conditionlist, choicelist, default="Not Specified")

print(df)

Output:

         Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400        black
2    Zeppy  2020-02-05     300          Low
3    Alina  2020-03-10     500         High
4    Jerry  2020-04-16     600         High
5    Kevin  2020-05-01     300          Low

Here, if the first condition in the condition list is satisfied for a row, then Salary_Rangethe value of the column of that particular row will be set to the first element in the selection list. Similar is true for the other conditions in the condition list. If any condition in the condition list is not satisfied, then Salary_Rangethe value of the column of that row will be set to np.where()the value of the default parameter in the method, for example, Not Specified.

`pandas.DataFrame.apply`Create new DataFrame columns based on given conditions in Pandas

pandas.DataFrame.apply returns a DataFrame
with the results of applying a given function along a given axis of the DataFrame.

grammar:

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)

funcRepresents the function to be applied.

axisrepresents the axis to which the function is applied. We can apply the function to each row using axis=1or .axis = 'columns'

We can use this method to check the condition and set the value for each row of the new column.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)


def set_values(row, value):
    return value[row]


map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].apply(set_values, args=(map_dictionary,))

print(df)

Output:

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

Here, we define a function set_values()that df.apply()is applied to each row using . The function Salarysets the value of the column for Salary_Rangeeach row based on the value of the column for that row. We set up a map_dictionaryto Salarydetermine the value of the column based on the data in the column Salary_Range. This approach gives us more flexibility when there are many options for the new column.

`pandas.Series.map()`Create new DataFrame columns based on given conditions in Pandas

We can also create new DataFramecolumns based on a given condition in Pandas using pandas.Series.map(). This method works element-wise on the Series and maps the values from one column to another based on the input which may be a dictionary, function or Series.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].map(map_dictionary)

print(df)

Output:

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

It creates a new column Salary_Rangeand map_dictionarysets the value of each row of that column based on the key-value pairs in .

Previous：How to get the sum of elements in a Pandas column

Next：Replace column values in Pandas DataFrame

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >