In Pandas library there are several ways to replace or update the column value in DataFarame. Changing the column values is required to curate/clean the data on DataFrame. When we are working with data we have to edit or remove certain pieces of data. We can also create new columns from existing ones or modify existing columns. If we want to do this, Pandas provides a wide range of methods that you can use to work with columns of all data types in your DataFrames.
Now, we will look specifically at replacing column values and changing part of the string (sub-strings) within columns in a DataFrame.
Related: pandas Get Column Cell value from DataFrame
Below are some approaches to replace column values in Pandas DataFrame.
1. Quick Examples of Replace Column Value on Pandas DataFrame
If you are in a hurry, below are some quick examples of replace/edit/update column values in Pandas DataFrame.
# Quick examples of replace column value on pandas dataframe
# Example 1: Replace a single value with a new value
# For an individual DataFrame column
df['Course'] = df['Course'].replace(['Spark'],'Pyspark')
# Example 2: Replace multiple values with a new value
# For an individual DataFrame column
df['Course'] = df['Course'].replace(['Pyspark','Python',...],'Spark')
# Example 3: Replace multiple values with multiple new values
# For an individual DataFrame column
df['Course'] = df['Course'].replace(['Pyspark','Python'....,]['Spark','22000'....,])
# Example 4: Replace a single value with a new value
# For an entire DataFrame
df = df.replace(['Pyspark'],'Spark')
Now, we will run these examples with a sample DataFrame and explore the output.
Let’s create a Pandas DataFrame with a few rows and columns, execute these examples, and validate the results.
# Create a Pandas DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","Pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)
Yields below output.
2. Replace Single Value with a New Value in Pandas DataFrame
If you want to replace a single value with a new value in a Pandas DataFrame, you can use the replace()
method. For instance, the replaces the value ‘Spark’ in the ‘Courses’ column with ‘Pyspark’. The resulting DataFrame (df
) will have the updated value in the specified column. In order to replace a value in Pandas DataFrame, use the replace()
method with the column the from and to values.
# Replace values in pandas DataFrame
df = pd.DataFrame(technologies, columns= ['Courses','Fee'])
df['Courses'] = df['Courses'].replace(['Spark'],'Pyspark')
print("DataFrame after replacement:\n",df)
Notice that all the Spark
values are replaced with the Pyspark
values under the first column.
3. Replace Multiple Values with a New Value in DataFrame
Let’s see how to replace multiple values with a new value on DataFrame column. In the below example, this will replace occurrences of 'Pyspark
‘ and 'Python'
with 'Spark'
in the ‘Courses’ column of your DataFrame. The resulting DataFrame (df
) will have the updated values in the specified column.
# Replace multiple values with a new value in DataFrame
df = pd.DataFrame(technologies, columns= ['Courses','Fee'])
df['Courses'] = df['Courses'].replace(['PySpark','Python'],'Spark')
print("DataFrame after replacement:\n",df)
We can notice that both the Pyspark
and Python
courses got replaced with a Spark
course.
# Output:
DataFrame after replacement:
Courses Fee
0 Spark 20000
1 Spark 25000
2 Spark 22000
3 Pandas 30000
4. Replace Multiple Values With Multiple New Values For a DataFrame
If you want to replace multiple values with multiple new values for a single DataFrame column. For example.
- The
Pyspark
with aSpark
- The
Python
with a22000
# Replace multiple values with multiple new values
df = pd.DataFrame(technologies, columns= ['Courses','Fee'])
df['Courses'] = df['Courses'].replace(['Pyspark','Python'],['Spark','22000'])
print("DataFrame after replacement:\n",df)
We can see that the ‘Pyspark’ became ‘Spark’ and the ‘Python’ became ‘22000’ under the first column.
# Output:
DataFrame after replacement:
Courses Fee
0 Spark 20000
1 Spark 25000
2 22000 22000
3 Pandas 30000
5. Replace Single Value With New Value on All Columns of DataFrame
By now, you have seen how to replace values under a Single DataFrame column. But now, we will look at how to replace a value across the entire DataFrame.
For example, If you, run the code below, it replaces the Pyspark
course with a Spark
course throughout the entire DataFrame on all columns
# Replace single value with new value in entire DataFrame
df = pd.DataFrame(technologies, columns= ['Courses','Fee'])
df = df.replace(['Pyspark'],'Spark')
print("DataFrame after replacement:\n",df)
Now we run the code, we can see that Pyspark
became Spark
across all the columns in the DataFrame.
# Output:
DataFrame after replacement:
Course Fee
0 Spark 20000
1 Spark 25000
2 Python 22000
3 Pandas 30000
6. Replace Values on Multiple Columns of DataFrame
If we want to replace values on multiple columns with different values on each column use df.loc()
and repalce()
method.
# Replace values on multpile columns
df.loc[:,('Fee', 'Duration')].replace(25000, Spark)
print("DataFrame after replacement:\n",df)
Yields below output.
# Output:
DataFrame after replacement:
Courses Fee Duration
0 Spark 20000 30days
1 Pyspark Spark 40days
2 Python 22000 35days
3 Pandas 30000 50days
Frequently Asked Questions on Replace Column Value in DataFrame
You can replace a specific value in a column with a new value using the replace()
method in Pandas. For example, the replaces the value ‘A’ with ‘X’ in the ‘Column_Name’ column. The resulting DataFrame (df
) will have the updated values in the specified column. You can modify the old and new values based on your specific requirements.
To replace multiple values in a column, you can use the replace()
method with a dictionary specifying the mapping of old values to new values.
You can replace values in multiple columns simultaneously using the replace()
method in Pandas. For example, the replaces ‘A’ with ‘X’ in Column1
and ‘Y’ with ‘Z’ in Column3
using the replace_dict
. You can extend the replacement to more columns by including them in the list passed to df[['Column1', 'Column3']]
.
To replace values in a column based on a condition, you can use boolean indexing along with the loc
accessor in Pandas. For example, the code replaces values in the ‘Column_Name’ column with ‘High’ where the original values are greater than 20. Adjust the condition in the loc
statement based on your specific criteria.
By default, the replace()
method in Pandas does not modify the DataFrame in place. Instead, it returns a new DataFrame with the specified replacements. If you want to modify the original DataFrame in place, you can use the inplace=True
parameter.
You can use regular expressions for replacement in Pandas using the replace()
method. The regex
parameter allows you to specify whether the search for the old values should be treated as regular expressions.
Conclusion
In this article, you have learned about how to replace a single value, multiple values, and multiple values with a new DataFrame.
Related Articles
- Pandas Rename Index Values of DataFrame
- Pandas Replace substring in DataFrame
- How to Check Pandas Version?
- Pandas Correlation of Columns
- Pandas Insert List into Cell of DataFrame
- Pandas Replace Blank Values (empty) with NaN
- Pandas Replace Values based on Condition
- Pandas Replace NaN with Blank/Empty String
- Pandas Series.replace() – Replace Values
- How to Replace String in pandas DataFrame
- pandas DataFrame replace() – by Examples
- Pandas – Get Column Index For Column Name