JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

How to create DataFrame columns based on a given condition in Pandas

Author:JIYIK Last Updated:2025/05/01 Views:

We can use list comprehensions of DataFrame objects, NumPy methods, apply()and map()methods to create columns based on a given condition in Pandas DataFrame.


DataFrameList comprehension to create new columns based on given conditions in Pandas

We can make use of various list comprehensions to create new columns based on given conditions in Pandas DataFrame. List comprehension is a method of creating new lists from iterable objects. It is faster and simpler than other methods.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = ["Senior" if s >= 400 else "Junior" for s in df["Salary"]]
print(df)

Output:

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

If Salaryis greater than or equal to 400, it will dfcreate a new column in Statuswith the value of Senior, otherwise Junior.


NumPy method to create new DataFrame columns based on given conditions in Pandas

We can also use the NumPy method to create a column based on a given condition in Pandas DataFrame. For this, we can use np.where()the method and np.select()the method.

np.where()method

np.where()Takes a condition as input and returns the indices of the elements that satisfy the given condition. This method can be used to create a DataFrame column based on a given condition in Pandas when we have only one condition.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

df["Status"] = np.where(df["Salary"] >= 400, "Senior", "Junior")
print(df)

Output:

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

If the condition is met, then np.where(condition, x, y)return x, otherwise return y.

The above code will dfcreate a new column in Statuswith the value of primary if the given condition is met Senior. Otherwise, it will set the value to primary.

np.select()method

np.where() takes a condition list and a selection list as input and returns an array built from the elements in the selection list based on the condition. When we have two or more conditions, we can use this method to create a DataFrame column based on the given condition in Pandas.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

conditionlist = [
    (df["Salary"] >= 500),
    (df["Salary"] >= 300) & (df["Salary"] < 300),
    (df["Salary"] <= 300),
]
choicelist = ["High", "Mid", "Low"]
df["Salary_Range"] = np.select(conditionlist, choicelist, default="Not Specified")

print(df)

Output:

         Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400        black
2    Zeppy  2020-02-05     300          Low
3    Alina  2020-03-10     500         High
4    Jerry  2020-04-16     600         High
5    Kevin  2020-05-01     300          Low

Here, if the first condition in the condition list is satisfied for a row, then Salary_Rangethe value of the column of that particular row will be set to the first element in the selection list. Similar is true for the other conditions in the condition list. If any condition in the condition list is not satisfied, then Salary_Rangethe value of the column of that row will be set to np.where()the value of the default parameter in the method, for example, Not Specified.


pandas.DataFrame.applyCreate new DataFrame columns based on given conditions in Pandas

pandas.DataFrame.apply returns a DataFrame
with the results of applying a given function along a given axis of the DataFrame.

grammar:

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)

funcRepresents the function to be applied.

axisrepresents the axis to which the function is applied. We can apply the function to each row using axis=1or .axis = 'columns'

We can use this method to check the condition and set the value for each row of the new column.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)


def set_values(row, value):
    return value[row]


map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].apply(set_values, args=(map_dictionary,))

print(df)

Output:

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

Here, we define a function set_values()that df.apply()is applied to each row using . The function Salarysets the value of the column for Salary_Rangeeach row based on the value of the column for that row. We set up a map_dictionaryto Salarydetermine the value of the column based on the data in the column Salary_Range. This approach gives us more flexibility when there are many options for the new column.


pandas.Series.map()Create new DataFrame columns based on given conditions in Pandas

We can also create new DataFramecolumns based on a given condition in Pandas using pandas.Series.map(). This method works element-wise on the Series and maps the values ​​from one column to another based on the input which may be a dictionary, function or Series.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].map(map_dictionary)

print(df)

Output:

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

It creates a new column Salary_Rangeand map_dictionarysets the value of each row of that column based on the key-value pairs in .

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Convert Pandas to CSV without index

Publish Date:2025/05/01 Views:159 Category:Python

As you know, an index can be thought of as a reference point used to store and access records in a DataFrame. They are unique for each row and usually range from 0 to the last row of the DataFrame, but we can also have serial numbers, dates

Convert Pandas DataFrame to Dictionary

Publish Date:2025/05/01 Views:197 Category:Python

This tutorial will show you how to convert a Pandas DataFrame into a dictionary with the index column elements as keys and the corresponding elements of other columns as values. We will use the following DataFrame in the article. import pan

Convert Pandas DataFrame columns to lists

Publish Date:2025/05/01 Views:191 Category:Python

When working with Pandas DataFrames in Python, you often need to convert the columns of the DataFrame into Python lists. This process is very important for various data manipulation and analysis tasks. Fortunately, Pandas provides several m

Subtracting Two Columns in Pandas DataFrame

Publish Date:2025/05/01 Views:120 Category:Python

Pandas can handle very large data sets and has a variety of functions and operations that can be applied to the data. One of the simple operations is to subtract two columns and store the result in a new column, which we will discuss in thi

Dropping columns by index in Pandas DataFrame

Publish Date:2025/05/01 Views:99 Category:Python

DataFrames can be very large and can contain hundreds of rows and columns. It is necessary to master the basic maintenance operations of DataFrames, such as deleting multiple columns. We can use dataframe.drop() the method to delete columns

Pandas Copy DataFrame

Publish Date:2025/05/01 Views:53 Category:Python

This tutorial will show you how to DataFrame.copy() copy a DataFrame object using the copy method. import pandas as pd items_df = pd . DataFrame( { "Id" : [ 302 , 504 , 708 ], "Cost" : [ "300" , "400" , "350" ], } ) print (items_df) Output:

Pandas DataFrame.ix[] Function

Publish Date:2025/05/01 Views:168 Category:Python

Python Pandas DataFrame.ix[] function slices rows or columns based on the value of the argument. pandas.DataFrame.ix[] grammar DataFrame . ix[index = None , label = None ] parameter index Integer or list of integers used to slice row indice

Pandas DataFrame.describe() Function

Publish Date:2025/05/01 Views:120 Category:Python

Python Pandas DataFrame.describe() function returns the statistics of a DataFrame. pandas.DataFrame.describe() grammar DataFrame . describe( percentiles = None , include = None , exclude = None , datetime_is_numeric = False ) parameter perc

Pandas DataFrame.astype() Function

Publish Date:2025/05/01 Views:160 Category:Python

Python Pandas DataFrame.astype() function changes the data type of an object to the specified data type. pandas.DataFrame.astype() grammar DataFrame . astype(dtype, copy = True , errors = "raise" ) parameter dtype The data type we want to a

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial