Combine values from multiple columns into one column in Pandas DataFrame

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Combine values from multiple columns into one column in Pandas DataFrame

Author：JIYIK Last Updated：2025/05/01 Views：

This tutorial will demonstrate how to merge or return the first non-null values from multiple columns into another column in a Python Pandas DataFrame.

For example, if it is not empty, the value of column 1 is used for the new column 3; otherwise, if column 1 is empty, the value of column 2 is used for the new column 3.

We can accomplish this task in a number of ways in a Pandas DataFrame.

Writing code from scratch to merge values from multiple columns into one column in a Pandas DataFrame

We can write the logic from scratch to merge the values. In the following code, we create a Pandas DataFrame with three columns named , Age_in_Years, Age_in_Monthsand Age_in_Days.

The DataFrame also has some missing values. If we want to display the age, first, we will output the age in years.

If the value in that column is Null, we will display the age in months. Similarly, if the value in months is Null, we will display the age in days.

To do this, we write the code from scratch to get the value of the first non-null column. The function is iterating over all DataFrame columns and returns the value where a non-null value is found; otherwise, it checks the values in other columns.

Sample code:

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)


def get_first_non_null(dfrow, cols):
    for c in cols:
        if pd.notnull(dfrow[c]):
            return dfrow[c]
    return None


cols = ["Age_in_Years", "Age_in_Months", "Age_in_Days"]
df_age["Age"] = df_age.apply(lambda x: get_first_non_null(x, cols), axis=1)
display(df_age)

Output:

Run a SQL query in a Pandas DataFrame using DuckDB to combine values from multiple columns into one column

Sample code:

DuckDB is a Python API and a database management system for interacting with databases using SQL queries. This package has a built-in merge method that selects the first non-null value from a column.

We will pass the column names in the SQL query to the coalesce method.

# Python 3.x
import pandas as pd
import duckdb

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age = duckdb.query(
    """SELECT Age_in_Years, Age_in_Months, Age_in_Days, coalesce(Age_in_Years, Age_in_Months, Age_in_days) as Age from df_age"""
).to_df()
display(df_age)

Output:

In Pandas DataFrame, use `combine_first()`the method to combine values from multiple columns into one column

combine_first()Method fills the empty values in one DataFrame with the non-empty data from the second DataFrame to combine two DataFrame objects.

In the following code, we will return the column values. We will combine Age_in_Yearswith and with .Age_in_MonthsAge_in_MonthsAge_in_Days

It will return Age_in_yearsthe value from . If that is Null, it will return Age_in_Monthsthe value from . Likewise, if that is also Null, it will Age_in_Daysreturn a value from .

The data in the actual DataFrame will not change and we will Ageget the values we want in the columns.

Sample code:

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age["Age"] = (
    df_age["Age_in_Years"]
    .combine_first(df_age["Age_in_Months"])
    .combine_first(df_age["Age_in_Days"])
)
df_age

Output:

In Pandas DataFrame, use `bfill()`the method to combine values from multiple columns into one column

bfillStands for Backward Fill. This method replaces NaNs with the next row or column value.

Here, we specify to axis=1return the value from the next column if the value in the current column is Null.

Sample code:

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age["Age"] = df_age.bfill(axis=1).iloc[:, 0]
df_age

Output:

In Pandas DataFrame, use `mask()`the method to combine values from multiple columns into one column

mask()Method works if-thensimilarly to .

If the null condition for a column is false, then its value will be used. Otherwise, it will get the value from the other specified column.

Sample code:

# Python 3.x
import pandas as pd

df_age = pd.DataFrame(
    {
        "Age_in_Years": ["4 y", None, None, None],
        "Age_in_Months": ["48 m", "24 m", None, None],
        "Age_in_Days": ["1440 d", None, "2520 d", None],
    }
)
df_age["Age"] = (
    df_age["Age_in_Years"]
    .mask(pd.isnull, df_age["Age_in_Months"])
    .mask(pd.isnull, df_age["Age_in_Days"])
)
df_age

Output:

Previous：Filling Missing Values in Pandas DataFrame

Next：Performing T-Test in Pandas

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >