Pandas DataFrame DataFrame.set_index() function

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Pandas DataFrame DataFrame.set_index() function

Author：JIYIK Last Updated：2025/05/01 Views：

The pandas.DataFrame.set_index() method can be used to set an array or column of appropriate length as the index of a DataFrame, even after the DataFrame is created. The newly set index can replace the existing index or extend the existing index.

`pandas.DataFrame.set_index()`Method Syntax

DataFrame.set_index(
    keys, drop=True, append=False, inplace=False, verify_integrity=False
)

parameter


`keys`	Column or list of columns to set as index
`drop`	Boolean. The default value `True`is to delete the column to be set as the index.
`append`	Boolean. The default value is `False`, which specifies whether to append the column to the existing index.
`inplace`	Boolean. If `True`true, the caller's DataFrame is modified in-place.
`verify_integrity`	Boolean. If true `True`, raises when creating an index with duplicates `ValueError`. The default value is`False`

Return Value

Returns a object with the modified index columns if inplace; otherwise .TrueDataFrameNone

Example Code: `DataFrame.set_index()`Setting the Pandas DataFrame Index Using Pandas Methods

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" ) ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier']) 
print(df)
df_modified=df.set_index("Name")
print(df_modified)

Output:

        Name  Price In_Stock Supplier
0     Orange     34      Yes      ABC
1      Mango     24       No      ABC
2     banana     14       No      ABC
3      Apple     44      Yes      XYZ
4  Pineapple     64       No      XYZ
5       Kiwi     84      Yes      XYZ
           Price In_Stock Supplier
Name                              
Orange        34      Yes      ABC
Mango         24       No      ABC
banana        14       No      ABC
Apple         44      Yes      XYZ
Pineapple     64       No      XYZ
Kiwi          84      Yes      XYZ

The original Dataframeuses the numeric range as the default index column, while in modified_df, we use set_index()the method to set Namethe column as the index.

Example Code: `DataFrame.set_index()`Setting in Pandas Method`drop=False`

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, 
                  columns = ['Name',
                             'Price',
                             'In_Stock',
                             'Supplier']) 
print(df)

df_modified=df.set_index("Name",drop=False)

print(df_modified)

Output:

     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
          Name  Price In_Stock Supplier
Name                                   
Orange  Orange     34      Yes      ABC
Mango    Mango     24       No      ABC
banana  banana     14       No      ABC
Apple    Apple     44      Yes      XYZ

If we set_indexset it in the DataFrame's method drop=False, even if Namethe column is set to indexthe column, it is still Dataframea column in .

Example code `DataFrame.set_index`is set in Pandas method`inplace=True`

import pandas as pd

fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
print("Before Setting Index:")
print(df)
df.set_index("Name",inplace=True)
print("After Setting Index:")
print(df)

Output:

Before Setting Index:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
After Setting Index:
        Price In_Stock Supplier
Name                           
Orange     34      Yes      ABC
Mango      24       No      ABC
banana     14       No      ABC
Apple      44      Yes      XYZ

If we set_index()set it in the method inplace=True, the caller DataFramewill be modified in place.

Example Code: `DataFrame.set_index()`Setting a Multi-Index Column Using Pandas Methods

import pandas as pd

fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" )  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
print("Before Setting Index:")
print(df)
df.set_index("Name",append=True,inplace=True,drop=False)
print("After Setting Index:")
print(df)

Output:

Before Setting Index:
     Name  Price In_Stock Supplier
0  Orange     34      Yes      ABC
1   Mango     24       No      ABC
2  banana     14       No      ABC
3   Apple     44      Yes      XYZ
After Setting Index:
            Name  Price In_Stock Supplier
  Name                                   
0 Orange  Orange     34      Yes      ABC
1 Mango    Mango     24       No      ABC
2 banana  banana     14       No      ABC
3 Apple    Apple     44      Yes      XYZ

If we set_index()set it in the method append=True, the newly set index column will be appended to the existing index, and a single DataFramehas multiple index columns.

Example code: Pandas behavior `Dataframe.set_index()`when `verify_integrity`is`True`

import pandas as pd

fruit_list = [
    ("Orange", 34, "Yes", "ABC"),
    ("Mango", 24, "No", "ABC"),
    ("Apple", 14, "No", "ABC"),
    ("Apple", 44, "Yes", "XYZ"),
]

df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock", "Supplier"])

df_modified = df.set_index("Name", verify_integrity=True)
print(df_modified)

Output:

Traceback (most recent call last):
  .....line 3920, in set_index
    dup=duplicates))
ValueError: Index has duplicate keys: Index(['Apple'], dtype='object', name='Name')

AppleIt is raised because the index has a duplicate index - ValueError. It has two Appleset as indexes in the column; therefore, it raises an error if set_index()in the method verify_integrityis set to .True

Previous：Pandas DataFrame DataFrame.plot.bar() Function

Next：Replace column values in Pandas DataFrame

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >