Pandas Remap Values in Column with a Dictionary (Dict)

We are often required to remap a Pandas DataFrame column values with a dictionary (Dict), you can achieve this by using DataFrame.replace() method. The DataFrame.replace() method takes different parameters and signatures, we will use the one that takes Dictionary(Dict) to remap the column values. As you know Dictionary is a key-value pair where the key is the existing value on the column and value is the literal value you wanted to replace with.

While working with data in Pandas DataFrame, we perform an array of operations on the data as part of clean-up or standardization to get the data in the desired form. One of these operations could be that we want to remap the values of a specific column in the DataFrame one most used example would be converting 2 letter stage code to full name or vice-versa. Let’s discuss several ways with examples to remap values in the DataFrame column with a dictionary.

In the below example, I have a DataFrame with a column Course and I will remap the values of this column with a new value.

1. Remap Column Values with a Dict Using Pandas DataFrame.replace()

You can use df.replace({"Courses": dict}) to remap/replace values in pandas DataFrame with Dictionary values. It allows you the flexibility to replace the column values with regular expressions for regex substitutions.

First, let’s create a Pandas DataFrame.


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

Yields below result. As you see the DataFrame has 4 columns Courses, Fee, Duration and Discount.


   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000   30days      1000
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500

Now we will remap the values of the 'Courses‘ column by their respective codes using the df.replace() function.


# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2=df.replace({"Courses": dict})
print(df2)

Yields below output.


  Courses    Fee Duration  Discount
0       S  22000   30days      1000
1       P  25000   50days      2300
2       H  23000   30days      1000
3       P  24000     None      1200
4       P  26000      NaN      2500

In case if you wanted to remap column values on the existing DataFrame, use inplace=True.


df.replace({"Courses": dict},inplace=True)
print(df)

2. Remap None or NaN Column Values

use df.replace({"Duration": dict_duration},inplace=True) to remap none or NaN values in pandas DataFrame with Dictionary values. To remap None/NaN values of the 'Duration‘ column by their respective codes using the df.replace() function. Read how to replace None/NaN values with empty string in pandas.


#Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  22000       30      1000
1  PySpark  25000       50      2300
2   Hadoop  23000       30      1000
3   Python  24000       50      1200
4   Pandas  26000       50      2500

3. Remap Multiple Column Values

You want to remap values in multiple columns Courses and Duration in pandas DataFrame. You can use df.replace({"Courses": dict,"Duration": dict_duration},inplace=True) their respective codes using the df.replace() function. It retrieves Courses and Duration both are remapped columns values.


df = pd.DataFrame(technologies)
#Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print(df)

Yields below output.


  Courses    Fee Duration  Discount
0       S  22000       30      1000
1       P  25000       50      2300
2       H  23000       30      1000
3       P  24000       50      1200
4       P  26000       50      2500

4. Remap Multiple Columns with Same Value

You can use df.replace(remap_values,value='--',inplace=True) to remap multiple columns with the same values in pandas DataFrame. Use remap_values = {"Courses":'Spark', "Duration": '30days'} remap multiple columns to retrieve values '--'.


# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0       --  22000       --      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000       --      1000
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500

5. Remap Values Directly on the Series

Now we are remapping values directly on the series of the Courses column by respective codes using df["Courses"].replace(dict, inplace=True) function.


# Remap Values Directly on the Series
df = pd.DataFrame(technologies)
df["Courses"].replace(dict, inplace=True)
print(df)

Yields below output.


  Courses    Fee Duration  Discount
0       S  22000   30days      1000
1       P  25000   50days      2300
2       H  23000   30days      1000
3       P  24000     None      1200
4       P  26000      NaN      2500

6. Using map() to Remap Column Values in Pandas

Pandas also provide map() method that can be used to remap single or multiple column values.

Using map() to remap column values in pandas DataFrame can split the list into different columns and use the map to replace values. The dictionary has more than a couple of keys, using map() can be much faster than replace(). Use this syntax: df["Courses"]= df["Courses"].map(dict) there are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values.

7. Exhaustive Mapping


#Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print(df)

Yields below output.


  Courses    Fee Duration  Discount
0       S  22000    30day      1000
1       P  25000   50days      2300
2       H  23000   55days      1000
3       P  24000   40days      1200
4       P  26000   60days      2500

8. Non-Exhaustive Mapping

If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna.


#Using Non-Exhaustive Mapping add fillna 
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print(df1)

Yields below output.


0    S
1    P
2    H
3    P
4    P
Name: Courses, dtype: object

9. Complete Example of Dictionary to Remap Columns Values in Pandas DataFrame


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2=df.replace({"Courses": dict})
print(df2)

#Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print(df)

df = pd.DataFrame(technologies)
# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
#Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print(df)

# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print(df)

# Remap Values Directly on the Series
df = pd.DataFrame(technologies)
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df["Courses"].replace(dict, inplace=True)
print(df)

#Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print(df)

#Using Non-Exhaustive Mapping add fillna 
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print(df1)

Conclusion

In this article, you have learned how to remap column values with Dict in Pandas DataFrame using the DataFrame.replace() and DataFrame.map(). with DataFrame.replace(), remap none or nan column values, remap multiple column values, and same values. Also, DataFrame.map() function, you have learned pandas remap values in a column with a dictionary(Dict) two approaches. Exhaustive Mapping and Non-Exhaustive Mapping.

Happy Learning !!

Other Good Reads

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas Remap Values in Column with a Dictionary (Dict)