In Pandas library there are several ways to replace or update the column value in DataFarame. Changing the column values is required to curate/clean the data on DataFrame. When we are working with data we have to edit or remove certain pieces of data. We can also create new columns from existing ones or modify existing columns. If we want to do this, Pandas provides a wide range of methods that you can use to work with columns of all data types in your DataFrames.
Now, we will look specifically at replacing column values and changing part of the string (sub-strings) within columns in a DataFrame.Â
Related: pandas Get Column Cell value from DataFrame
Below are some approaches to replace column values in Pandas DataFrame.
1.Quick Examples of Replace Column Value on Pandas DataFrame
Below are some of the quick examples that replace/edit/update column value in pandas DataFrame.
# Below are quick example
# Replace a single value with a new value for an individual DataFrame column.
df['Course'] = df['Course'].replace(['Spark'],'Pyspark')
# Replace multiple values with a new value for an individual DataFrame column.
df['Course'] = df['Course'].replace(['Pyspark','Python',...],'Spark')
# Replace multiple values with multiple new values for an individual DataFrame column.
df['Course'] = df['Course'].replace(['Pyspark','Python'....,]['Spark','22000'....,])
# Replace a single value with a new value for an entire DataFrame.
df = df.replace(['Pyspark'],'Spark')
Now, we will run these examples with a sample DataFrame and explore the output.
Let’s create a Pandas DataFrame with a few rows and columns, execute these examples and validate results.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","Pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
2. Replace Single Value with a New Value in Pandas DataFrame
In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values. Below example replace Spark
with PySpark
value on the Course
column.
# Replace values in pandas DataFrame.
df = pd.DataFrame(technologies, columns= ['Course','Fee'])
df['Course'] = df['Course'].replace(['Spark'],'Pyspark')
print(df)
Notice that all the Spark
values are replaced with the Pyspark
values under the first column.
Course Fee
0 pyspark 20000
1 Pyspark 25000
2 Python 22000
3 Pandas 30000
3. Replace Multiple Values with a New Value in DataFrame
Let’s see how to replace multiple values with a new value on DataFrame column. In the below example, I am replacing the Pyspark
and Python
courses with a Spark
value under the Courses
column.
df = pd.DataFrame(technologies, columns= ['Courses','Fee'])
df['Courses'] = df['Courses'].replace(['Pyspark','Python'],'Spark')
print (df)
We can notice that both the Pyspark
and Python
course got replaced with a Spark
course.
Courses Fee
0 Spark 20000
1 Spark 25000
2 Spark 22000
3 Pandas 30000
4. Replace Multiple Values With Multiple New Values For a  DataFrame
If you want to replace multiple values with multiple new values for a single DataFrame column. For example.
- The
Pyspark
with aSpark
- The
Python
with a22000
#Replace multiple values with multiple new values.
df = pd.DataFrame(technologies, columns= ['Courses','Fee'])
df['Courses'] = df['Courses'].replace(['Pyspark','Python'],['Spark','22000'])
print (df)
We can see that the ‘Pyspark’ became ‘Spark’ and the ‘Python’ became ‘22000’ under the first column.
Courses Fee
0 Spark 20000
1 Spark 25000
2 22000 22000
3 Pandas 30000
5. Replace Single Value With New Value on All Columns of DataFrame
By now, you have seen how to replace values under a Single DataFrame column. But now, we will look at how to replace a value across the entire DataFrame?.
For example, If you, run the code below, it replaces the Pyspark
course with a Spark
course throughout the entire DataFrame on all columns
# Replace single value with new value in entire DataFrame.
df = pd.DataFrame(technologies, columns= ['Courses','Fee'])
df = df.replace(['Pyspark'],'Spark')
print(df)
Now we run the code, we can see that Pyspark
became Spark
across all the columns in the DataFrame.
Course Fee
0 Spark 20000
1 Spark 25000
2 Python 22000
3 Pandas 30000
6. Replace Values on Multiple Columns of DataFrame
If we want to replace values on Multiple Columns with different values on each column use df.loc()
and repalce()
method.
# Replace Values on Multpile Columns.
df.loc[:,('Fee', 'Duration')].replace(25000, Spark)
print(df)
Yields below output.
Courses Fee Duration
0 Spark 20000 30days
1 Pyspark Spark 40days
2 Python 22000 35days
3 Pandas 30000 50days
Conclusion
In this article, you have learned about how to replace the single value, multiple values, multiple values with a new Data frame.
Related Articles
- Pandas Rename Index Values of DataFrame
- Pandas Replace substring in DataFrame
- Pandas Replace Blank Values (empty) with NaN
- Pandas Replace Values based on Condition
- Pandas Replace NaN with Blank/Empty String
- Pandas Series.replace() – Replace Values
- How to Replace String in pandas DataFrame
- pandas DataFrame replace() – by Examples