We are often required to remap a Pandas DataFrame column values with a dictionary (Dict), you can achieve this by using the DataFrame.replace() method. This method takes different parameters and signatures, we will use the one that takes Dictionary(Dict) to remap the column values. As you know Dictionary is a key-value pair where the key is the existing value on the column and the value is the literal value you wanted to replace with.
While working with data in Pandas DataFrame, we perform an array of operations on the data as part of clean-up or standardization to get the data in the desired form. One of these operations could be that we want to remap the values of a specific column in the DataFrame one most used example would be converting 2 letter stage code to full name or vice-versa. Let’s discuss several ways with examples to remap values in the DataFrame column with a dictionary.
In the below example, I have a DataFrame with a column Course
and I will remap the values of this column with a new value.
1. Remap Column Values with a Dict Using Pandas DataFrame.replace()
You can use df.replace({"Courses": dict})
to remap/replace values in pandas DataFrame with Dictionary values. It allows you the flexibility to replace the column values with regular expressions for regex substitutions.
First, let’s create a Pandas DataFrame.
# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below result. As you see the DataFrame has 4 columns Courses
, Fee
, Duration
and Discount
.
Now we will remap the values of the 'Courses
‘ column by their respective codes using the df.replace()
function.
# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2 = df.replace({"Courses": dict})
print("After replacing a column values with a dictionary values:\n", df2)
Yields below output.
In case, if you want to remap column values on the existing DataFrame, use inplace=True
.
# Remap column values in inplace
df.replace({"Courses": dict},inplace=True)
print("After replacing a column values with a dictionary values:\n", df)
2. Remap None or NaN Column Values
You can also use df.replace({"Duration": dict_duration},inplace=True)
to remap none or NaN values in pandas DataFrame with Dictionary values. To remap None
/NaN
values of the 'Duration
‘ column by their respective codes using this function. Read how to replace None/NaN values with empty string in pandas.
# Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print("After replacing a column values with a dictionary values:\n", df)
Yields below output.
# Output:
# After replacing a column values with a dictionary values:
Courses Fee Duration Discount
0 Spark 22000 30 1000
1 PySpark 25000 50 2300
2 Hadoop 23000 30 1000
3 Python 24000 50 1200
4 Pandas 26000 50 2500
3. Remap Multiple Column Values
You want to remap values in multiple columns Courses and Duration in pandas DataFrame. You can use df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
their respective codes using the df.replace()
function. It retrieves Courses
and Duration
both are remapped column values.
# Remap multiple column values
df = pd.DataFrame(technologies)
# Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print("After replacing multiple column values with a dictionary:\n", df)
Yields below output.
# Output:
# After replacing multiple column values with a dictionary:
Courses Fee Duration Discount
0 S 22000 30 1000
1 P 25000 50 2300
2 H 23000 30 1000
3 P 24000 50 1200
4 P 26000 50 2500
4. Remap Multiple Columns with Same Value
You can use df.replace(remap_values,value='--',inplace=True)
to remap multiple columns with the same values in pandas DataFrame. Use remap_values = {"Courses":'Spark', "Duration": '30days'}
remap multiple columns to retrieve values '--'
.
# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print("After replacing multiple column values with dictionary:\n", df2)
Yields below output.
# Output:
# After replacing multiple column values with dictionary:
Courses Fee Duration Discount
0 -- 22000 -- 1000
1 PySpark 25000 50days 2300
2 Hadoop 23000 -- 1000
3 Python 24000 None 1200
4 Pandas 26000 NaN 2500
5. Remap Values Directly on the Series
Now we are remapping values directly on the series of the Courses
column by respective codes using df["Courses"].replace(dict, inplace=True)
function.
# Remap Values Directly on the Series
df = pd.DataFrame(technologies)
df["Courses"].replace(dict, inplace=True)
print("After replacing a column values with dictionary:\n", df)
Yields below output.
# Output:
After replacing a column values with dictionary:
Courses Fee Duration Discount
0 S 22000 30days 1000
1 P 25000 50days 2300
2 H 23000 30days 1000
3 P 24000 None 1200
4 P 26000 NaN 2500
6. Using map() to Remap Column Values in Pandas
Pandas also provide a map() method that can be used to remap single or multiple-column values.
Using map() to remap column values in pandas DataFrame can split the list into different columns and use the map to replace values. The dictionary has more than a couple of keys, so using map() can be much faster than replace(). Use this syntax: df["Courses"]= df["Courses"].map(dict)
there are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values.
7. Exhaustive Mapping
# Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print("After replacing column values with dictionary:\n", df)
Yields below output.
# Output:
# After replacing column values with dictionary:
Courses Fee Duration Discount
0 S 22000 30day 1000
1 P 25000 50days 2300
2 H 23000 55days 1000
3 P 24000 40days 1200
4 P 26000 60days 2500
8. Non-Exhaustive Mapping
If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna
.
# Using Non-Exhaustive Mapping add fillna
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print("After replacing column values with dictionary:\n", df1)
Yields below output.
# Output:
# After replacing column values with dictionary:
0 S
1 P
2 H
3 P
4 P
Name: Courses, dtype: object
9. Complete Example of Dictionary to Remap Columns Values in Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print(df)
# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2=df.replace({"Courses": dict})
print(df2)
# Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print(df)
df = pd.DataFrame(technologies)
# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
# Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print(df)
# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print(df)
# Remap Values Directly on the Series
df = pd.DataFrame(technologies)
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df["Courses"].replace(dict, inplace=True)
print(df)
#Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print(df)
#Using Non-Exhaustive Mapping add fillna
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print(df1)
Conclusion
In this article, you have learned how to remap column values with Dict in Pandas DataFrame using the DataFrame.replace()
and DataFrame.map()
. with DataFrame.replace()
, remap none or nan column values, remap multiple column values, and same values. Also, DataFrame.map()
function, you have learned pandas remap values in a column with a dictionary(Dict) two approaches. Exhaustive Mapping and Non-Exhaustive Mapping.
Happy Learning !!
Other Good Reads
- How to Rename Columns in Pandas DataFrame
- How to Add New Column to Existing Pandas DataFrame
- Install pandas on Windows Step-by-Step
- Pandas groupby() Explained With Examples
- How to Get a Cell Value From Pandas DataFrame?
- Pandas – Convert DataFrame to Dictionary (Dict)
- Pandas Create DataFrame From Dict (Dictionary)
- Pandas Drop the First Row of DataFrame