Pandas Index is an immutable sequence used for indexing DataFrame and Series. pandas.Index
is a basic object that stores axis labels for all pandas objects.
DataFrame is a two-dimensional data structure, immutable, heterogeneous tabular data structure with labeled axis rows, and columns. pandas Dataframe is consists of three components principal, data, rows, and columns. In DataFrame the row labels are called index.
Series is a one-dimensional array that is capable of storing various data types (integer, string, float, python objects, etc.). We can easily convert the list, tuple, and dictionary into Series using the series()
 method. In Series, the row labels are called the index. The Series can have only one column, but it cannot contain multiple columns. List, NumPy Array, Dict can be turned into a pandas Series.Â
1. What is pandas Index?
pandas have several classes to define the Index and an instance of Index can only contain hashable objects.
pandas Index | Description |
---|---|
RangeIndex | Index implementing a monotonic integer range. |
CategoricalIndex | Index based on an underlying Categorical. |
MultiIndex | A multi-level, or hierarchical Index. |
IntervalIndex | Immutable index of intervals that are closed on the same side. |
DatetimeIndex | ndarray-like of datetime64 data. |
TimedeltaIndex | ndarray of timedelta64 data, represented internally as int64 |
PeriodIndex | ndarray holding ordinal values indicating regular periods in time. |
NumericIndex | Index of numpy int/uint/float data. |
2. Create Index
You can create a pandas Index through its constructor. You can use any class from the above table to create an Index.
# Syntax of Index() constructor.
class pandas.Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True, **kwargs)
data
– list of data you preffered to have on Index.dtype
– NumPy suppoted data type. When it is None, it uses best type s per the data.copy
– bool type. Make a copy of input ndarrayname
– Name of the Index.tupleize_cols
– When True, attempt to create a MultiIndex if possible**kwargs
– **kwargs
3. Create Series with Index
By default, the Series is created with a default Index starting from zero and incrementing by 1. Series can be created through its constructor and takes the values as an argument.
s=pd.Series(['A','B','C','D','E'])
print(s)
# Outputs
#0 A
#1 B
#2 C
#3 D
#4 E
This creates a Series with a default numerical index starting from zero. You can also set the Index with the custom values while creating a Series object.
idx= ['idx1','idx2','idx3','idx4','idx5']
s=pd.Series(['A','B','C','D','E'],index=idx)
print(s)
# Outputs
#dtype: object
#idx1 A
#idx2 B
#idx3 C
#idx4 D
#idx5 E
#dtype: object
Now let’s create an Index from the RangeIndex()
class. The below example creates Index starting from integer number 5.
idx=pd.RangeIndex(5,10)
s=pd.Series(['A','B','C','D','E'],index=idx)
print(s)
#Outputs
#5 A
#6 B
#7 C
#8 D
#9 E
#dtype: object
4. Create DataFrame with Index
One of the easiest ways to create a pandas DataFrame is by using its constructor. Like Series, DataFrame is also created with a default index when not specified.
# Create pandas DataFrame from List
import pandas as pd
technologies = [ ["Spark",20000, "30days"],
["pandas",20000, "40days"],
]
df=pd.DataFrame(technologies)
print(df)
Since we have not given labels to columns and rows(index), DataFrame by default assigns incremental sequence numbers as labels to both rows and columns called Index.
0 1 2
0 Spark 20000 30days
1 pandas 20000 40days
Column names with sequence numbers don’t make sense as it’s hard to identify what data holds on each column hence, it is always best practice to provide column names that identify the data it holds. Use column
param and index
param to provide column & row labels respectively to the DataFrame.
# Add Column & Row Labels to the DataFrame
column_names=["Courses","Fee","Duration"]
row_label=["a","b"]
df=pd.DataFrame(technologies,columns=column_names,index=row_label)
print(df)
Yields below output.
Courses Fee Duration
a Spark 20000 30days
b pandas 20000 40days
5. Get DataFrame Index as a List
Sometimes you may be required to get the pandas DataFrame index as a list of values, you can do this by using df.index.values
. Note that df.index
returns a Series object.
# Get Index as Series
print(df.index)
# Outputs
# RangeIndex(start=0, stop=3, step=1)
# Get Index as List
print(df.index.values)
# Outputs
# [0 1 2]
6. Get Rows by Index
By using DataFrame.iloc[] property you can get the row by Index.
# Get Row by Index.
print(df.iloc[2])
# Outputs
Courses Hadoop
Fee 26000
Duration 35days
Discount 1500
Name: idx3, dtype: object
7. Set Labels to Index
The labels for the Index can be changed as shown in below.
# Set new Index
df.index = pd.Index(['idx1','idx2','idx3'])
print(df.index)
# Outputs
# Index(['idx1', 'idx2', 'idx3'], dtype='object')
8. Set Index to Column & Column to Index
DataFrame.reset_index() is used to set the Index as column and resets the Index from zero. The below example adds column with name as Index
to DataFrame.
# Set Index to Column
df2=df.reset_index()
print(df2)
# Outputs
index Courses Fee Duration Discount
0 idx1 Spark 20000 30day 1000
1 idx2 PySpark 25000 40days 2300
2 idx3 Hadoop 26000 35days 1500
DataFrame.set_index() is used to set the DataFrame column as Index. The below example set’s Courses
column as index.
# Set Column as Index
df2=df.set_index('Courses')
print(df2)
Happy Learning !!