• Post author:
  • Post category:Pandas
  • Post last modified:December 10, 2024
  • Reading time:18 mins read
You are currently viewing Pandas Remap Values in Column with a Dictionary (Dict)

We are often required to remap a Pandas DataFrame column values with a dictionary (Dict), you can achieve this by using the DataFrame.replace() method. This method takes different parameters and signatures, we will use the one that takes Dictionary(Dict) to remap the column values. As you know Dictionary is a key-value pair where the key is the existing value on the column and the value is the literal value you wanted to replace with.

Advertisements

While working with data in Pandas DataFrame, we perform an array of operations on the data as part of clean-up or standardization to get the data in the desired form. One of these operations could be that we want to remap the values of a specific column in the DataFrame one most used example would be converting 2 letter stage code to full name or vice-versa. Let’s discuss several ways with examples to remap values in the DataFrame column with a dictionary.

In the below example, I have a DataFrame with a column Course and I will remap the values of this column with a new value.

Key Points

  • Remapping values allows you to replace specific values in a column based on a predefined dictionary.
  • The replace() function in Pandas can be used to map values in a column using a dictionary.
  • The dictionary should be in {old_value: new_value} format, where keys represent values to be replaced and values represent the new values.
  • Use the inplace=True parameter to apply changes directly to the DataFrame without creating a new one.
  • You can apply a dictionary mapping across multiple columns by passing a dictionary of dictionaries.
  • Use map() for straightforward one-to-one mapping and replace() for broader or complex mappings.

Remap Column Values with a Dict Using Pandas DataFrame.replace()

You can use df.replace({"Courses": dict}) to remap/replace values in pandas DataFrame with Dictionary values. It allows you the flexibility to replace the column values with regular expressions for regex substitutions.

First, let’s create a Pandas DataFrame.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below result. As you see the DataFrame has 4 columns Courses, Fee, Duration and Discount.

Pandas Column Remap Values

Now we will remap the values of the 'Courses‘ column by their respective codes using the df.replace() function.


# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2 = df.replace({"Courses": dict})
print("After replacing a column values with a dictionary values:\n", df2)

Yields below output.

Pandas Column Remap Values

In case, if you want to remap column values on the existing DataFrame, use inplace=True.


# Remap column values in inplace
df.replace({"Courses": dict},inplace=True)
print("After replacing a column values with a dictionary values:\n", df)

Remap None or NaN Column Values

You can also use df.replace({"Duration": dict_duration},inplace=True) to remap none or NaN values in pandas DataFrame with Dictionary values. To remap None/NaN values of the 'Duration‘ column by their respective codes using this function. Read how to replace None/NaN values with empty string in pandas.


# Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print("After replacing a column values with a dictionary values:\n", df)

Yields below output.


# Output:
# After replacing a column values with a dictionary values:
   Courses    Fee Duration  Discount
0    Spark  22000       30      1000
1  PySpark  25000       50      2300
2   Hadoop  23000       30      1000
3   Python  24000       50      1200
4   Pandas  26000       50      2500

Remap Multiple Column Values

You want to remap values in multiple columns Courses and Duration in pandas DataFrame. You can use df.replace({"Courses": dict,"Duration": dict_duration},inplace=True) their respective codes using the df.replace() function. It retrieves Courses and Duration both are remapped column values.


# Remap multiple column values
df = pd.DataFrame(technologies)
# Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print("After replacing multiple column values with a dictionary:\n", df)

Yields below output.


# Output:
# After replacing multiple column values with a dictionary:
  Courses    Fee Duration  Discount
0       S  22000       30      1000
1       P  25000       50      2300
2       H  23000       30      1000
3       P  24000       50      1200
4       P  26000       50      2500

Remap Multiple Columns with Same Value

You can use df.replace(remap_values,value='--',inplace=True) to remap multiple columns with the same values in pandas DataFrame. Use remap_values = {"Courses":'Spark', "Duration": '30days'} remap multiple columns to retrieve values '--'.


# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print("After replacing multiple column values with dictionary:\n", df2)

Yields below output.


# Output:
# After replacing multiple column values with dictionary:
   Courses    Fee Duration  Discount
0       --  22000       --      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000       --      1000
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500

Remap Values Directly on the Series

Now we are remapping values directly on the series of the Courses column by respective codes using df["Courses"].replace(dict, inplace=True) function.


# Remap Values Directly on the Series
df = pd.DataFrame(technologies)
df["Courses"].replace(dict, inplace=True)
print("After replacing a column values with dictionary:\n", df)

Yields below output.


# Output:
After replacing a column values with dictionary:
  Courses    Fee Duration  Discount
0       S  22000   30days      1000
1       P  25000   50days      2300
2       H  23000   30days      1000
3       P  24000     None      1200
4       P  26000      NaN      2500

Using map() to Remap Column Values in Pandas

Pandas also provide a map() method that can be used to remap single or multiple-column values.

Using map() to remap column values in pandas DataFrame can split the list into different columns and use the map to replace values. The dictionary has more than a couple of keys, so using map() can be much faster than replace(). Use this syntax: df["Courses"]= df["Courses"].map(dict) there are two versions of this approach, depending on whether your dictionary exhaustively maps all possible values.

Exhaustive Mapping


# Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print("After replacing column values with dictionary:\n", df)

Yields below output.


# Output:
# After replacing column values with dictionary:
  Courses    Fee Duration  Discount
0       S  22000    30day      1000
1       P  25000   50days      2300
2       H  23000   55days      1000
3       P  24000   40days      1200
4       P  26000   60days      2500

Non-Exhaustive Mapping

If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna.


# Using Non-Exhaustive Mapping add fillna 
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print("After replacing column values with dictionary:\n", df1)

Yields below output.


# Output:
# After replacing column values with dictionary:
0    S
1    P
2    H
3    P
4    P
Name: Courses, dtype: object

Complete Example of Dictionary to Remap Columns Values


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df2=df.replace({"Courses": dict})
print(df2)

# Remap values for None & nan
df = pd.DataFrame(technologies)
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
df.replace({"Duration": dict_duration},inplace=True)
print(df)

df = pd.DataFrame(technologies)
# Difine Dict with the key-value pair to remap.
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
dict_duration = {"30days" : '30', "50days" : '50', "55days": '55',np.nan:'50'}
# Remap multiple columns at same time
df.replace({"Courses": dict,"Duration": dict_duration},inplace=True)
print(df)

# Remap Different columns for specific values
df = pd.DataFrame(technologies)
remap_values = {"Courses" : 'Spark', "Duration" : '30days'}
df.replace(remap_values,value='--',inplace=True)
print(df)

# Remap values directly on the Series
df = pd.DataFrame(technologies)
dict = {"Spark" : 'S', "PySpark" : 'P', "Hadoop": 'H', "Python" : 'P', "Pandas": 'P'}
df["Courses"].replace(dict, inplace=True)
print(df)

# Using exhaustive map() dict
df = pd.DataFrame(technologies)
df["Courses"]=df['Courses'].map(dict)
print(df)

# Using Non-exhaustive mapping add fillna 
df1 = df['Courses'].map(dict).fillna(df['Courses'])
print(df1)

FAQ on Pandas Remap Values in Column with a Dictionary

What is remapping in Pandas?

Remapping refers to replacing the existing values in a column with new values based on a dictionary mapping. This is typically done using the replace() or map() functions in Pandas.

How do I remap values in a single column using a dictionary?

To remap values in a single column of a Pandas DataFrame using a dictionary, you can use either the replace() or map() function.

What is the difference between map() and replace() for remapping?

replace(): Works on the entire column or multiple columns and can handle dictionaries and lists.
map(): Works only on Series and applies a mapping or a function to each element.

Can I remap values in multiple columns at once?

You can remap values in multiple columns at once in a Pandas DataFrame by using the replace() function with a dictionary of dictionaries. Each inner dictionary will map the old values to new values for each respective column.

Can I use a function to remap instead of a dictionary?

You can use a function to remap values in a Pandas DataFrame column instead of a dictionary. You can achieve this using the apply() method for row-wise transformations, or by using map() for element-wise transformations on a Series.

Conclusion

In this article, you have learned how to remap column values with Dict in Pandas DataFrame using the DataFrame.replace() and DataFrame.map(). with DataFrame.replace(), remap none or nan column values, remap multiple column values, and same values. Also, DataFrame.map() function, you have learned pandas remap values in a column with a dictionary(Dict) two approaches. Exhaustive Mapping and Non-Exhaustive Mapping.

Happy Learning !!

Other Good Reads

References

Leave a Reply