• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing Pandas Read JSON File with Examples

Pandas read_json() function can be used to read JSON file or string into DataFrame. It supports JSON in several formats by using orient param.

JSON is shorthand for JavaScript Object Notation which is the most used file format that is used to exchange data between two systems or web applications. When we are working with files in big data or machine learning we are often required to process JSON files.

In this article, I will explain how to read JSON from string and file into pandas DataFrame and also use several optional params with examples.

1. pandas read_json() Syntax

Following is the syntax of the read_json() function. This either returns DataFrame or Series. Use typ param to specify the return type, by default, it returns DataFrame.


# Syntax of read_json() 
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer', nrows=None, storage_options=None)

As you see above, it takes several optional parameters to support reading JSON files with different options. When you are dealing with huge files, some of these params helps you in loading JSON files faster. In this article, I will explain the usage of some of these options with examples.

2. Read JSON String Example

If you have a JSON in a string, you can read or load this into pandas DataFrame using read_json() function. By default, JSON string should be in Dict like format {column -> {index -> value}}. This is also called column orientation.

Note that orient param is used to specify the JSON string format. The set of possible orients is index, columns, records, split, values. By default, it takes columns value.


# Import pandas
import pandas as pd

# Read json from String
json_str = '{"Courses":{"r1":"Spark"},"Fee":{"r1":"25000"},"Duration":{"r1":"50 Days"}}'
df = pd.read_json(json_str)
print("Reading JSON to a string:\n",df)

Yields below output.

pandas read json file

Now let’s use JSON string in another format [{column -> value}, ... , {column -> value}]. This used records orientation.


# Read json from String
json_str = '[{"Courses":"Spark","Fee":"25000","Duration":"50 Days","Discount":"2000"}]'
df = pd.read_json(json_str, orient='records')
print(df)

#Outputs
#  Courses    Fee Duration  Discount
#0   Spark  25000  50 Days      2000

3. Pandas Read JSON File Example

Let’s use pandas read_json() function to read JSON file into DataFrame. This by default supports JSON in single lines or in multiple lines.

The following file contains JSON in a Dict like format.

pandas read json file
JSON File

Let’s load this JSON file into DataFrame. Find this JSON file at GitHub.


# pandas read JSON File
df = pd.read_json('courses_data.json')
print(df)

# Outputs
#  Courses    Fee Duration
#0   Spark  25000  50 Days
#1  Pandas  20000  35 Days
#2    Java  15000 

In case you have JSON records in a list. use the below JSON file from GitHub

pandas read json

# Read JSON file with records orient
df = pd.read_json('/Users/admin/apps/courses.json', orient='records')
print(df)        

4. Read N Records from JSON File

When you have a JSON record per each line, you can use nrows param to specify how many records you wanted to load. This can be used only when lines=True is used.


# Read JSON file with records orient
df = pd.read_json('courses.json', orient='records', nrows=2, lines=True)
print(df)  

5. Compression & Encoding

Use compression param to uncompress and load JSON files from {'zip''gzip''bz2''zstd'}.

When using ‘zip’, make sure the ZIP file contains only one data file. Use None value to specify no decompression.

Use encoding param to support custom encoding, by default it uses UTF-8 encoding.

6. Other Params to Read JSON

  • dtype – Specify a dict of column to dtype. When True, infer the dtype based on data. If False, then don’t infer dtypes.
  • convert_axes – Convert the axes to the proper dtypes.
  • convert_dates – If True then all date like columns will be converted to date. If False it doesn’t convert.
  • keep_default_dates – If True, based on column labels it converts the datelike columns.

Frequently Asked Questions on Pandas Read JSON File 

How can I read a JSON file using Pandas?

To read a JSON file using Pandas, you can use the pd.read_json() function. Replace 'your_file.json' with the actual path to your JSON file. This assumes that your JSON file has a simple structure without nested objects or arrays.

What if my JSON file is not a simple flat structure?

If your JSON file has nested structures, you can use the orient parameter to specify the orientation of the data. Common values for orient include 'split', 'records', 'index', 'columns', and 'values'.

Can I read JSON data from an API using Pandas?

You can use Pandas to directly read JSON data from an API. For example, the requests.get(url) function sends a GET request to the specified API endpoint (url). The response is then passed to pd.read_json(), which reads the JSON data into a Pandas DataFrame. Finally, you can print or manipulate the DataFrame as needed.

How can I handle datetime columns when reading JSON?

If your JSON file contains datetime information, you can use the convert_dates parameter to specify which columns should be converted to datetime format.

What if my JSON file is compressed (e.g., gzip)?

If your JSON file is compressed, such as in gzip format, you can still use Pandas to read it by specifying the compression type using the compression parameter in the pd.read_json() function.

How can I handle missing data while reading JSON?

When reading a JSON file using Pandas, you can control how missing data (null or NaN values) is handled using the pd.read_json() function’s orient and convert_axes parameters.

Conclusion

In this article, I have explained how to read or load JSON string or file into pandas DataFrame. One of the most important param to be aware of is orient which specifies the format of the JSON you are trying to load.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium