Pandas Read JSON File with Examples

pandas read_json() function can be used to read JSON file or string into DataFrame. It supports JSON in several formats by using orient param.

JSON is shorthand for JavaScript Object Notation which is the most used file format that is used to exchange data between two systems or web applications. When we are working with files in big data or machine learning we are often required to process JSON files.

In this article, I will explain how to read JSON from string and file into pandas DataFrame and also use several optional params with examples.

1. pandas read_json() Syntax

Following is the syntax of the read_json() function. This either returns DataFrame or Series. Use typ param to specify the return type, by default, it returns DataFrame.


# Syntax of read_json() 
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer', nrows=None, storage_options=None)

As you see above, it takes several optional parameters to support reading JSON files with different options. When you are dealing with huge files, some of these params helps you in loading JSON files faster. In this article, I will explain the usage of some of these options with examples.

2. Read JSON String Example

If you have a JSON in a string, you can read or load this into pandas DataFrame using read_json() function. By default, JSON string should be in Dict like format {column -> {index -> value}}. This is also called column orientation.

Note that orient param is used to specify the JSON string format. The set of possible orients is index, columns, records, split, values. By default, it takes columns value.


# Import pandas
import pandas as pd

# Read json from String
json_str = '{"Courses":{"r1":"Spark"},"Fee":{"r1":"25000"},"Duration":{"r1":"50 Days"}}'
df = pd.read_json(json_str)
print(df)

# Outputs
#   Courses    Fee Duration
#r1   Spark  25000  50 Days

Now let’s use JSON string in another format [{column -> value}, ... , {column -> value}]. This used records orientation.


# Read json from String
json_str = '[{"Courses":"Spark","Fee":"25000","Duration":"50 Days","Discount":"2000"}]'
df = pd.read_json(json_str, orient='records')
print(df)

#Outputs
#  Courses    Fee Duration  Discount
#0   Spark  25000  50 Days      2000

3. Pandas Read JSON File Example

Let’s use pandas read_json() function to read JSON file into DataFrame. This by default supports JSON in single lines or in multiple lines.

The following file contains JSON in a Dict like format.

JSON File

Let’s load this JSON file into DataFrame. Find this JSON file at GitHub.


# pandas read JSON File
df = pd.read_json('courses_data.json')
print(df)

# Outputs
#  Courses    Fee Duration
#0   Spark  25000  50 Days
#1  Pandas  20000  35 Days
#2    Java  15000 

In case you have JSON records in a list. use the below JSON file from GitHub

pandas read json

# Read JSON file with records orient
df = pd.read_json('/Users/admin/apps/courses.json', orient='records')
print(df)        

4. Read N Records from JSON File

When you have a JSON record per each line, you can use nrows param to specify how many records you wanted to load. This can be used only when lines=True is used.


# Read JSON file with records orient
df = pd.read_json('courses.json', orient='records', nrows=2, lines=True)
print(df)  

5. Compression & Encoding

Use compression param to uncompress and load JSON files from {'zip''gzip''bz2''zstd'}.

When using ‘zip’, make sure the ZIP file contains only one data file. Use None value to specify no decompression.

Use encoding param to support custom encoding, by default it uses UTF-8 encoding.

6. Other Params to Read JSON

  • dtype – Specify a dict of column to dtype. When True, infer the dtype based on data. If False, then don’t infer dtypes.
  • convert_axes – Convert the axes to the proper dtypes.
  • convert_dates – If True then all date like columns will be converted to date. If False it doesn’t convert.
  • keep_default_dates – If True, based on column labels it converts the datelike columns.

Conclusion

In this article, I have explained how to read or load JSON string or file into pandas DataFrame. One of the most important param to be aware of is orient which specifies the format of the JSON you are trying to load.

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

You are currently viewing Pandas Read JSON File with Examples