• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:13 mins read
You are currently viewing Pandas Remap Values in Column with a Dictionary (Dict)

We are often required to remap a Pandas DataFrame column values with a dictionary (Dict), you can achieve this by using the DataFrame.replace() method. This method takes different parameters and signatures, we will use the one that takes Dictionary(Dict) to remap the column values. As you know Dictionary is a key-value pair where the key is the existing value on the column and the value is the literal value you wanted to replace with.

While working with data in Pandas DataFrame, we perform an array of operations on the data as part of clean-up or standardization to get the data in the desired form. One of these operations could be that we want to remap the values of a specific column in the DataFrame one most used example would be converting 2 letter stage code to full name or vice-versa. Let’s discuss several ways with examples to remap values in the DataFrame column with a dictionary.

In the below example, I have a DataFrame with a column Course and I will remap the values of this column with a new value.

1. Remap Column Values with a Dict Using Pandas DataFrame.replace()

You can use df.replace({"Courses": dict}) to remap/replace values in pandas DataFrame with Dictionary values. It allows you the flexibility to replace the column values with regular expressions for regex substitutions.

First, let’s create a Pandas DataFrame.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below result. As you see the DataFrame has 4 columns Courses, Fee, Duration and Discount.

Pandas Column Remap Values

Now we will remap the values of the 'Courses‘ column by their respective codes using the df.replace() function.


# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2 = df.replace({"Courses": dict})
print("After replacing a column values with a dictionary values:\n", df2)

Yields below output.

Pandas Column Remap Values

In case, if you want to remap column values on the existing DataFrame, use inplace=True.


# Remap column values in inplace
df.replace({"Courses": dict},inplace=True)
print("After replacing a column values with a dictionary values:\n", df)

2. Remap None or NaN Column Values

You can also use df.replace({"Duration": dict_duration},inplace=True) to remap none or NaN values in pandas DataFrame with Dictionary values. To remap None/NaN values of the 'Duration‘ column by their respective codes using this function. Read how to replace None/NaN values with empty string in pandas.


# Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print("After replacing a column values with a dictionary values:\n", df)

Yields below output.


# Output:
# After replacing a column values with a dictionary values:
   Courses    Fee Duration  Discount
0    Spark  22000       30      1000
1  PySpark  25000       50      2300
2   Hadoop  23000       30      1000
3   Python  24000       50      1200
4   Pandas  26000       50      2500

3. Remap Multiple Column Values

You want to remap values in multiple columns Courses and Duration in pandas DataFrame. You can use df.replace({"Courses": dict,"Duration": dict_duration},inplace=True) their respective codes using the df.replace() function. It retrieves Courses and Duration both are remapped column values.


# Remap multiple column values
df = pd.DataFrame(technologies)
# Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print("After replacing multiple column values with a dictionary:\n", df)

Yields below output.


# Output:
# After replacing multiple column values with a dictionary:
  Courses    Fee Duration  Discount
0       S  22000       30      1000
1       P  25000       50      2300
2       H  23000       30      1000
3       P  24000       50      1200
4       P  26000       50      2500

4. Remap Multiple Columns with Same Value

You can use df.replace(remap_values,value='--',inplace=True) to remap multiple columns with the same values in pandas DataFrame. Use remap_values = {"Courses":'Spark', "Duration": '30days'} remap multiple columns to retrieve values '--'.


# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print("After replacing multiple column values with dictionary:\n", df2)

Yields below output.


# Output:
# After replacing multiple column values with dictionary:
   Courses    Fee Duration  Discount
0       --  22000       --      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000       --      1000
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500

5. Remap Values Directly on the Series

Now we are remapping values directly on the series of the Courses column by respective codes using df["Courses"].replace(dict, inplace=True) function.


# Remap Values Directly on the Series
df = pd.DataFrame(technologies)
df["Courses"].replace(dict, inplace=True)
print("After replacing a column values with dictionary:\n", df)

Yields below output.


# Output:
After replacing a column values with dictionary:
  Courses    Fee Duration  Discount
0       S  22000   30days      1000
1       P  25000   50days      2300
2       H  23000   30days      1000
3       P  24000     None      1200
4       P  26000      NaN      2500

6. Using map() to Remap Column Values in Pandas

Pandas also provide a map() method that can be used to remap single or multiple-column values.

Using map() to remap column values in pandas DataFrame can split the list into different columns and use the map to replace values. The dictionary has more than a couple of keys, so using map() can be much faster than replace(). Use this syntax: df["Courses"]= df["Courses"].map(dict) there are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values.

7. Exhaustive Mapping


# Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print("After replacing column values with dictionary:\n", df)

Yields below output.


# Output:
# After replacing column values with dictionary:
  Courses    Fee Duration  Discount
0       S  22000    30day      1000
1       P  25000   50days      2300
2       H  23000   55days      1000
3       P  24000   40days      1200
4       P  26000   60days      2500

8. Non-Exhaustive Mapping

If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna.


# Using Non-Exhaustive Mapping add fillna 
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print("After replacing column values with dictionary:\n", df1)

Yields below output.


# Output:
# After replacing column values with dictionary:
0    S
1    P
2    H
3    P
4    P
Name: Courses, dtype: object

9. Complete Example of Dictionary to Remap Columns Values in Pandas DataFrame


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2=df.replace({"Courses": dict})
print(df2)

# Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print(df)

df = pd.DataFrame(technologies)
# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
# Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print(df)

# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print(df)

# Remap Values Directly on the Series
df = pd.DataFrame(technologies)
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df["Courses"].replace(dict, inplace=True)
print(df)

#Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print(df)

#Using Non-Exhaustive Mapping add fillna 
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print(df1)

Conclusion

In this article, you have learned how to remap column values with Dict in Pandas DataFrame using the DataFrame.replace() and DataFrame.map(). with DataFrame.replace(), remap none or nan column values, remap multiple column values, and same values. Also, DataFrame.map() function, you have learned pandas remap values in a column with a dictionary(Dict) two approaches. Exhaustive Mapping and Non-Exhaustive Mapping.

Happy Learning !!

Other Good Reads

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply