Pandas Index Explained with Examples

  • Post author:
  • Post category:Pandas
  • Post last modified:February 5, 2024
  • Reading time:17 mins read

Pandas Index is an immutable sequence used for indexing DataFrame and Series. pandas.Index is a basic object that stores axis labels for all pandas objects.

DataFrame is a two-dimensional data structure, immutable, heterogeneous tabular data structure with labeled axis rows, and columns. pandas DataFrame consists of three components principal, data, rows, and columns. In DataFrame the row labels are called index.

Series is a one-dimensional array that is capable of storing various data types (integer, string, float, python objects, etc.). We can easily convert the list, tuple, and dictionary into Series using the series() method. In Series, the row labels are called the index. The Series can have only one column, but it cannot contain multiple columns. List, NumPy Array, Dict can be turned into a pandas Series. 

Key Points –

  • The Pandas Index is a fundamental data structure that provides an immutable, labeled axis for Series and DataFrame objects, enabling efficient data manipulation and alignment.
  • As an immutable data structure, the Pandas Index provides a stable and unchangeable identifier for rows or columns in a DataFrame, ensuring consistency and reliability in data handling operations.
  • The Index plays a crucial role in alignment during arithmetic and join operations, ensuring that data is correctly matched and combined across different DataFrames.
  • Pandas supports hierarchical indexing with MultiIndex, allowing users to create multi-level row or column labels, providing a powerful way to represent and analyze complex, multi-dimensional data.

1. What is the Pandas Index?

Pandas have several classes to define the Index and an instance of an Index can only contain hashable objects.

pandas IndexDescription
RangeIndexIndex implementing a monotonic integer range.
CategoricalIndexIndex based on an underlying Categorical.
MultiIndexA multi-level, or hierarchical Index.
IntervalIndexImmutable index of intervals that are closed on the same side.
DatetimeIndexndarray-like of datetime64 data.
TimedeltaIndexndarray of timedelta64 data, represented internally as int64
PeriodIndexndarray holding ordinal values indicating regular periods in time.
NumericIndexIndex of numpy int/uint/float data.

2. Create Index

You can create a pandas Index through its constructor. You can use any class from the above table to create an Index.


# Syntax of Index() constructor
class pandas.Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True, **kwargs)
  • data – list of data you prefer to have on the Index.
  • dtype – NumPy supported data type. When it is None, it uses the best type as per the data.
  • copy – bool type. Make a copy of the input ndarray
  • name – Name to be stored in the index.
  • tupleize_cols – When True, attempt to create a MultiIndex if possible
  • **kwargs – Additional keyword arguments to be passed to the specific Index class being used.

3. Create a Series with an Index

By default, the Series is created with a default Index starting from zero and incrementing by 1. Series can be created through its constructor and takes the values as an argument.


s=pd.Series(['A','B','C','D','E'])
print(s)

# Output:
# 0    A
# 1    B
# 2    C
# 3    D
# 4     E

This creates a Series with a default numerical index starting from zero. You can also set the Index with the custom values while creating a Series object.


idx= ['idx1','idx2','idx3','idx4','idx5']
s=pd.Series(['A','B','C','D','E'],index=idx)
print(s)

# Output:
# dtype: object
# idx1    A
# idx2    B
# idx3    C
# idx4    D
# idx5    E
# dtype: object

Now let’s create an Index from the RangeIndex() class. The below example creates Index starting from integer number 5.


idx=pd.RangeIndex(5,10)
s=pd.Series(['A','B','C','D','E'],index=idx)
print(s)

# Output:
# 5    A
# 6    B
# 7    C
# 8     D
# 9    E
# dtype: object

4. Create DataFrame with an Index

One of the easiest ways to create a pandas DataFrame is by using its constructor. Like Series, DataFrame is also created with a default index when not specified.


# Create pandas DataFrame from List
import pandas as pd
technologies = [ ["Spark",20000, "30days"], 
                 ["pandas",20000, "40days"], 
               ]
df=pd.DataFrame(technologies)
print(df)

Since we have not given labels to columns and rows(index), DataFrame by default assigns incremental sequence numbers as labels to both rows and columns called Index.


# Output:
        0      1       2
0   Spark  20000  30days
1  pandas  20000  40days

Column names with sequence numbers don’t make sense as it’s hard to identify what data holds on each column hence, it is always best practice to provide column names that identify the data it holds. Use column param and index param to provide column & row labels respectively to the DataFrame.


# Add Column & Row Labels to the DataFrame
column_names=["Courses","Fee","Duration"]
row_label=["a","b"]
df=pd.DataFrame(technologies,columns=column_names,index=row_label)
print(df)

Yields below output.


# Output:
  Courses    Fee Duration
a   Spark  20000   30days
b  pandas  20000   40days

5. Get DataFrame Index as a List

Sometimes you may be required to get the pandas DataFrame index as a list of values, you can do this by using df.index.values. Note that df.index returns a Series object.


# Get Index as Series
print(df.index)
# Outputs:
# RangeIndex(start=0, stop=3, step=1)

# Get Index as List
print(df.index.values)

# Output:
# [0 1 2]

6. Get Rows by Index

By using DataFrame.iloc[] property you can get the row by Index.


# Get Row by Index.
print(df.iloc[2])

# Output:
# Courses     Hadoop
# Fee          26000
# Duration    35days
# Discount      1500
# Name: idx3, dtype: object

7. Set Labels to Index

The labels for the Index can be changed as shown in below.


# Set new Index
df.index = pd.Index(['idx1','idx2','idx3'])
print(df.index)

# Output:
# Index(['idx1', 'idx2', 'idx3'], dtype='object')

8. Set Index to Column & Column to Index

DataFrame.reset_index() is used to set the Index as a column and reset the Index from zero. The below example adds column with name as Index to DataFrame.


# Set Index to Column
df2=df.reset_index()
print(df2)

# Output:
#   index  Courses    Fee Duration  Discount
# 0  idx1    Spark  20000    30day      1000
# 1  idx2  PySpark  25000   40days      2300
# 2  idx3   Hadoop  26000   35days      1500

DataFrame.set_index() is used to set the DataFrame column as Index. The below example set’s Courses column as index.


# Set Column as Index
df2=df.set_index('Courses')
print(df2)

Frequently Asked Questions

What is the Pandas Index?

The Pandas Index is a fundamental data structure that provides a labeled axis for Series and DataFrame objects. It facilitates efficient data manipulation, retrieval, and alignment.

How can I create a custom index in Pandas?

You can create a custom index using the Index() constructor or by using the set_index() method on a DataFrame, specifying the desired column as the index.

How does the Pandas Index support label-based indexing?

The Index allows for label-based indexing through the use of the loc[] indexer, enabling users to retrieve and manipulate data based on user-defined labels assigned to rows or columns.

What is the significance of immutability in the Pandas Index?

The immutability of the Pandas Index ensures that the labels assigned to rows or columns remain stable and unchangeable, preserving the integrity of the data structure during various operations.

How do I set a new index for a Pandas DataFrame?

To set a new index for a DataFrame, you can use the set_index() method, specifying the desired column(s) as the new index. Alternatively, you can directly assign a new Index object to the DataFrame’s index attribute.

How can I retrieve rows based on index labels in Pandas?

Rows can be retrieved based on index labels using the loc[] indexer. For example, df.loc['label'] would retrieve the row with the specified label.

Conclusion

In this article, I have explained the Pandas Index is crucial for effective data manipulation and analysis using the Pandas library. The Index serves as a labeled axis for Series and DataFrame objects, providing a foundation for label-based indexing, alignment operations, and efficient data retrieval. Key takeaways include examples.

Happy Learning!!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply