• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:17 mins read
You are currently viewing Pandas – Convert DataFrame to JSON String

You can convert Pandas DataFrame to JSON string by using the DataFrame.to_json() method. This method takes a very important param orient which accepts values ‘columns‘, ‘records‘, ‘index‘, ‘split‘, ‘table‘, and ‘values‘. JSON stands for JavaScript Object Notation. It is used to represent structured data. You can use it, especially for sharing data between servers and web applications.

In this article, I will cover how to convert Pandas DataFrame to JSON String. Pandas DataFrame.to_json() is used to convert a DataFrame to JSON string or store it to an external JSON file. The JSON format depends on what value you use for an orient parameter.

1. Quick Examples of Convert DataFrame To JSON String

If you are in a hurry, below are some quick examples of how to convert DataFrame to JSON String.


# Below are quick example

# Example 1: Use DataFrame.to_json() to orient = 'columns' 
df2 = df.to_json(orient = 'columns')  

# Example 2: Convert Pandas DataFrame To JSON Using orient = 'records' 
df2 = df.to_json(orient = 'records')

# Example 3: Convert Pandas DataFrame To JSON Using orient = 'index'
df2 = df.to_json(orient ='index')

# Example 4: Convert Pandas DataFrame To JSON Using orient = 'split'
df2 = df.to_json(orient = 'split')

# Example 5: Convert Pandas DataFrame To JSON Using orient = 'table'
df2 = df.to_json(orient = 'table')

# Example 6: Convert Pandas DataFrame To JSON Using orient ='values'
df2 = df.to_json(orient ='values')

Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = [
            ("Spark", 22000,'30days',1000.0),
            ("PySpark",25000,'50days',2300.0),
            ("Hadoop",23000,'55days',1500.0)
            ]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])
print("Create DataFrame:\n", df)

Yields below output.

pandas DataFrame convert JSON

2. Use DataFrame.to_json() to orient = ‘columns’

orient='columns' is a default value, when not specifying the DataFrame.to_json() function uses columns as orient and returns JSON string like a dict {column -> {index -> value}} format.


# Use DataFrame.to_json() to orient = 'columns' 
df2 = df.to_json(orient = 'columns')
print("After converting DataFrame to JSONstring:\n", df2)

Yields below output.

pandas DataFrame convert JSON

3. Convert DataFrame to JSON Using orient = ‘records’

Use orient='records' to convert DataFrame to JSON in format  [{column -> value}, … , {column -> value}]


# Convert Pandas DataFrame To JSON Using orient = 'records' 
df2 = df.to_json(orient = 'records')
print("After converting DataFrame to JSON string:\n", df2)

Yields below output.


# Output:
# After converting DataFrame to JSON string:
[{"Courses":"Spark","Fee":22000,"Duration":"30days","Discount":1000.0},{"Courses":"PySpark","Fee":25000,"Duration":"50days","Discount":2300.0},{"Courses":"Hadoop","Fee":23000,"Duration":"55days","Discount":1500.0}]

4. Using orient = ‘index’

use orient='index' to get JSON string in format dict like {index -> {column -> value}}


# Convert Pandas DataFrame To JSON Using orient = 'index'
df2 = df.to_json(orient ='index')
print("After converting DataFrame to JSONstring:\n", df2)

Yields below output.


# Output:
# After converting DataFrame to JSON string:
{"0":{"Courses":"Spark","Fee":22000,"Duration":"30days","Discount":1000.0},"1":{"Courses":"PySpark","Fee":25000,"Duration":"50days","Discount":2300.0},"2":{"Courses":"Hadoop","Fee":23000,"Duration":"55days","Discount":1500.0}}

5. Using orient = ‘split’

You can use orient='split' to convert DataFrame to JSON in format dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}.


# Convert Pandas DataFrame To JSON Using orient = 'split'
df2 = df.to_json(orient = 'split')
print("After converting DataFrame to JSONstring:\n", df2)

Yields below output.


# Output:
# After converting DataFrame to JSONstring:
{"columns":["Courses","Fee","Duration","Discount"],"index":[0,1,2],"data":[["Spark",22000,"30days",1000.0],["PySpark",25000,"50days",2300.0],["Hadoop",23000,"55days",1500.0]]}

6. Using orient = ‘table’

You can use orient = ‘table’ to convert DataFrame to JSON with format  dict like {‘schema’: {schema}, ‘data’: {data}}.


# Convert Pandas DataFrame To JSON Using orient = 'table'
df2 = df.to_json(orient = 'table')
print("After converting DataFrame to JSONstring:\n", df2)

Yields below output.


# Output:
# After converting DataFrame to JSONstring:
{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"Courses","type":"string"},{"name":"Fee","type":"integer"},{"name":"Duration","type":"string"},{"name":"Discount","type":"number"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"Courses":"Spark","Fee":22000,"Duration":"30days","Discount":1000.0},{"index":1,"Courses":"PySpark","Fee":25000,"Duration":"50days","Discount":2300.0},{"index":2,"Courses":"Hadoop","Fee":23000,"Duration":"55days","Discount":1500.0}]}

7. Using orient =’values’

You can also use orient =’values’ to get DataFrame as an array of values.


# Convert Pandas DataFrame To JSON Using orient ='values'
df2 = df.to_json(orient ='values')
print("After converting DataFrame to JSONstring:\n", df2)

Yields below output.


# Output:
# After converting DataFrame to JSONstring:
[["Spark",22000,"30days",1000.0],["PySpark",25000,"50days",2300.0],["Hadoop",23000,"55days",1500.0]]

8. Complete Example For Convert DataFrame To JSON


import pandas as pd
technologies = [
            ("Spark", 22000,'30days',1000.0),
            ("PySpark",25000,'50days',2300.0),
            ("Hadoop",23000,'55days',1500.0)
            ]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])
print(df)
 
# Use DataFrame.to_json() to orient = 'columns' 
df2 = df.to_json(orient = 'columns')
print(df2)   

# Convert Pandas DataFrame To JSON Using orient = 'records' 
df2 = df.to_json(orient = 'records')
print(df2)

# Convert Pandas DataFrame To JSON Using orient = 'index'
df2 = df.to_json(orient ='index')
print(df2)

# Convert Pandas DataFrame To JSON Using orient = 'split'
df2 = df.to_json(orient = 'split')
print(df2)

# Convert Pandas DataFrame To JSON Using orient = 'table'
df2 = df.to_json(orient = 'table')
print(df2)

# Convert Pandas DataFrame To JSON Using orient ='values'
df2 = df.to_json(orient ='values')
print(df2)

Frequently Asked Questions on Convert DataFrame To JSON

How do I convert a DataFrame to JSON in Python using pandas?

You can use the to_json() method in Pandas to convert the DataFrame to JSON. For example: df.to_json('output.json', orient='records').

What is the ‘orient’ parameter in the to_json() method?

The ‘orient’ parameter specifies the format of the JSON output. Common values include ‘split’, ‘records’, ‘index’, ‘columns’, and ‘values’. The most commonly used for interoperability is ‘records’.

How can I pretty print the JSON output when converting a DataFrame?

You can use the indent parameter in the to_json() method to print the pretty JSON output while converting a DataFrame. For example: df.to_json('output.json', orient='records', indent=4).

How can I convert only a specific subset of columns to JSON?

You can pass a subset of columns to the to_json() method to convert only a specific subset of columns to JSON. For example: df[['column1', 'column2']].to_json('output.json', orient='records').

What’s the difference between ‘split’ and ‘records’ orient in to_json()?

In 'split' orient, the JSON object is split into separate parts for index, columns, and data. In 'records' orient, each row in the DataFrame becomes a separate JSON object in the output.

Conclusion

In this article, you have learned how to convert pandas DataFrame to JSON by using DataFrame.to_json() method and with more examples. For more params use to_json() method from the pandas reference

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium