Combine values from multiple columns into one column in Pandas DataFrame
This tutorial will demonstrate how to merge or return the first non-null values from multiple columns into another column in a Python Pandas DataFrame.
For example, if it is not empty, the value of column 1 is used for the new column 3; otherwise, if column 1 is empty, the value of column 2 is used for the new column 3.
We can accomplish this task in a number of ways in a Pandas DataFrame.
Writing code from scratch to merge values from multiple columns into one column in a Pandas DataFrame
We can write the logic from scratch to merge the values. In the following code, we create a Pandas DataFrame with three columns named , Age_in_Years
, Age_in_Months
and Age_in_Days
.
The DataFrame also has some missing values. If we want to display the age, first, we will output the age in years.
If the value in that column is Null, we will display the age in months. Similarly, if the value in months is Null, we will display the age in days.
To do this, we write the code from scratch to get the value of the first non-null column. The function is iterating over all DataFrame columns and returns the value where a non-null value is found; otherwise, it checks the values in other columns.
Sample code:
# Python 3.x
import pandas as pd
df_age = pd.DataFrame(
{
"Age_in_Years": ["4 y", None, None, None],
"Age_in_Months": ["48 m", "24 m", None, None],
"Age_in_Days": ["1440 d", None, "2520 d", None],
}
)
def get_first_non_null(dfrow, cols):
for c in cols:
if pd.notnull(dfrow[c]):
return dfrow[c]
return None
cols = ["Age_in_Years", "Age_in_Months", "Age_in_Days"]
df_age["Age"] = df_age.apply(lambda x: get_first_non_null(x, cols), axis=1)
display(df_age)
Output:
Run a SQL query in a Pandas DataFrame using DuckDB to combine values from multiple columns into one column
Sample code:
DuckDB is a Python API and a database management system for interacting with databases using SQL queries. This package has a built-in merge method that selects the first non-null value from a column.
We will pass the column names in the SQL query to the coalesce method.
# Python 3.x
import pandas as pd
import duckdb
df_age = pd.DataFrame(
{
"Age_in_Years": ["4 y", None, None, None],
"Age_in_Months": ["48 m", "24 m", None, None],
"Age_in_Days": ["1440 d", None, "2520 d", None],
}
)
df_age = duckdb.query(
"""SELECT Age_in_Years, Age_in_Months, Age_in_Days, coalesce(Age_in_Years, Age_in_Months, Age_in_days) as Age from df_age"""
).to_df()
display(df_age)
Output:
In Pandas DataFrame, use combine_first()
the method to combine values from multiple columns into one column
combine_first()
Method fills the empty values in one DataFrame with the non-empty data from the second DataFrame to combine two DataFrame objects.
In the following code, we will return the column values. We will combine Age_in_Years
with and with .Age_in_Months
Age_in_Months
Age_in_Days
It will return Age_in_years
the value from . If that is Null, it will return Age_in_Months
the value from . Likewise, if that is also Null, it will Age_in_Days
return a value from .
The data in the actual DataFrame will not change and we will Age
get the values we want in the columns.
Sample code:
# Python 3.x
import pandas as pd
df_age = pd.DataFrame(
{
"Age_in_Years": ["4 y", None, None, None],
"Age_in_Months": ["48 m", "24 m", None, None],
"Age_in_Days": ["1440 d", None, "2520 d", None],
}
)
df_age["Age"] = (
df_age["Age_in_Years"]
.combine_first(df_age["Age_in_Months"])
.combine_first(df_age["Age_in_Days"])
)
df_age
Output:
In Pandas DataFrame, use bfill()
the method to combine values from multiple columns into one column
bfill
Stands for Backward Fill. This method replaces NaNs with the next row or column value.
Here, we specify to axis=1
return the value from the next column if the value in the current column is Null.
Sample code:
# Python 3.x
import pandas as pd
df_age = pd.DataFrame(
{
"Age_in_Years": ["4 y", None, None, None],
"Age_in_Months": ["48 m", "24 m", None, None],
"Age_in_Days": ["1440 d", None, "2520 d", None],
}
)
df_age["Age"] = df_age.bfill(axis=1).iloc[:, 0]
df_age
Output:
In Pandas DataFrame, use mask()
the method to combine values from multiple columns into one column
mask()
Method works if-then
similarly to .
If the null condition for a column is false, then its value will be used. Otherwise, it will get the value from the other specified column.
Sample code:
# Python 3.x
import pandas as pd
df_age = pd.DataFrame(
{
"Age_in_Years": ["4 y", None, None, None],
"Age_in_Months": ["48 m", "24 m", None, None],
"Age_in_Days": ["1440 d", None, "2520 d", None],
}
)
df_age["Age"] = (
df_age["Age_in_Years"]
.mask(pd.isnull, df_age["Age_in_Months"])
.mask(pd.isnull, df_age["Age_in_Days"])
)
df_age
Output:
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Convert Pandas to CSV without index
Publish Date:2025/05/01 Views:159 Category:Python
-
As you know, an index can be thought of as a reference point used to store and access records in a DataFrame. They are unique for each row and usually range from 0 to the last row of the DataFrame, but we can also have serial numbers, dates
Convert Pandas DataFrame to Dictionary
Publish Date:2025/05/01 Views:197 Category:Python
-
This tutorial will show you how to convert a Pandas DataFrame into a dictionary with the index column elements as keys and the corresponding elements of other columns as values. We will use the following DataFrame in the article. import pan
Convert Pandas DataFrame columns to lists
Publish Date:2025/05/01 Views:191 Category:Python
-
When working with Pandas DataFrames in Python, you often need to convert the columns of the DataFrame into Python lists. This process is very important for various data manipulation and analysis tasks. Fortunately, Pandas provides several m
Subtracting Two Columns in Pandas DataFrame
Publish Date:2025/05/01 Views:120 Category:Python
-
Pandas can handle very large data sets and has a variety of functions and operations that can be applied to the data. One of the simple operations is to subtract two columns and store the result in a new column, which we will discuss in thi
Dropping columns by index in Pandas DataFrame
Publish Date:2025/05/01 Views:99 Category:Python
-
DataFrames can be very large and can contain hundreds of rows and columns. It is necessary to master the basic maintenance operations of DataFrames, such as deleting multiple columns. We can use dataframe.drop() the method to delete columns
Pandas Copy DataFrame
Publish Date:2025/05/01 Views:53 Category:Python
-
This tutorial will show you how to DataFrame.copy() copy a DataFrame object using the copy method. import pandas as pd items_df = pd . DataFrame( { "Id" : [ 302 , 504 , 708 ], "Cost" : [ "300" , "400" , "350" ], } ) print (items_df) Output:
Pandas DataFrame.ix[] Function
Publish Date:2025/05/01 Views:168 Category:Python
-
Python Pandas DataFrame.ix[] function slices rows or columns based on the value of the argument. pandas.DataFrame.ix[] grammar DataFrame . ix[index = None , label = None ] parameter index Integer or list of integers used to slice row indice
Pandas DataFrame.describe() Function
Publish Date:2025/05/01 Views:120 Category:Python
-
Python Pandas DataFrame.describe() function returns the statistics of a DataFrame. pandas.DataFrame.describe() grammar DataFrame . describe( percentiles = None , include = None , exclude = None , datetime_is_numeric = False ) parameter perc
Pandas DataFrame.astype() Function
Publish Date:2025/05/01 Views:160 Category:Python
-
Python Pandas DataFrame.astype() function changes the data type of an object to the specified data type. pandas.DataFrame.astype() grammar DataFrame . astype(dtype, copy = True , errors = "raise" ) parameter dtype The data type we want to a