Pandas read_json()
function can be used to read JSON file or string into DataFrame. It supports JSON in several formats by using orient
param.
JSON is shorthand for JavaScript Object Notation which is the most used file format that is used to exchange data between two systems or web applications. When we are working with files in big data or machine learning we are often required to process JSON files.
In this article, I will explain how to read JSON from string and file into pandas DataFrame and also use several optional params with examples.
1. pandas read_json() Syntax
Following is the syntax of the read_json() function. This either returns DataFrame or Series. Use typ
param to specify the return type, by default, it returns DataFrame.
# Syntax of read_json()
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer', nrows=None, storage_options=None)
As you see above, it takes several optional parameters to support reading JSON files with different options. When you are dealing with huge files, some of these params helps you in loading JSON files faster. In this article, I will explain the usage of some of these options with examples.
2. Read JSON String Example
If you have a JSON in a string, you can read or load this into pandas DataFrame using read_json()
function. By default, JSON string should be in Dict like format {column -> {index -> value}}
. This is also called column
orientation.
Note that orient
param is used to specify the JSON string format. The set of possible orients is index
, columns
, records
, split
, values
. By default, it takes columns value.
# Import pandas
import pandas as pd
# Read json from String
json_str = '{"Courses":{"r1":"Spark"},"Fee":{"r1":"25000"},"Duration":{"r1":"50 Days"}}'
df = pd.read_json(json_str)
print("Reading JSON to a string:\n",df)
Yields below output.
Now let’s use JSON string in another format [{column -> value}, ... , {column -> value}]
. This used records
orientation.
# Read json from String
json_str = '[{"Courses":"Spark","Fee":"25000","Duration":"50 Days","Discount":"2000"}]'
df = pd.read_json(json_str, orient='records')
print(df)
#Outputs
# Courses Fee Duration Discount
#0 Spark 25000 50 Days 2000
3. Pandas Read JSON File Example
Let’s use pandas read_json()
function to read JSON file into DataFrame. This by default supports JSON in single lines or in multiple lines.
The following file contains JSON in a Dict like format.
Let’s load this JSON file into DataFrame. Find this JSON file at GitHub.
# pandas read JSON File
df = pd.read_json('courses_data.json')
print(df)
# Outputs
# Courses Fee Duration
#0 Spark 25000 50 Days
#1 Pandas 20000 35 Days
#2 Java 15000
In case you have JSON records in a list. use the below JSON file from GitHub
# Read JSON file with records orient
df = pd.read_json('/Users/admin/apps/courses.json', orient='records')
print(df)
4. Read N Records from JSON File
When you have a JSON record per each line, you can use nrows
param to specify how many records you wanted to load. This can be used only when lines=True
is used.
# Read JSON file with records orient
df = pd.read_json('courses.json', orient='records', nrows=2, lines=True)
print(df)
5. Compression & Encoding
Use compression
param to uncompress and load JSON files from {'zip'
, 'gzip'
, 'bz2'
, 'zstd'
}.
When using ‘zip’, make sure the ZIP file contains only one data file. Use None
value to specify no decompression.
Use encoding
param to support custom encoding, by default it uses UTF-8
encoding.
6. Other Params to Read JSON
dtype
– Specify a dict of column to dtype. When True, infer the dtype based on data. If False, then don’t infer dtypes.convert_axes
– Convert the axes to the proper dtypes.convert_dates
– If True then all date like columns will be converted to date. If False it doesn’t convert.keep_default_dates
– If True, based on column labels it converts the datelike columns.
Frequently Asked Questions on Pandas Read JSON File
To read a JSON file using Pandas, you can use the pd.read_json()
function. Replace 'your_file.json'
with the actual path to your JSON file. This assumes that your JSON file has a simple structure without nested objects or arrays.
If your JSON file has nested structures, you can use the orient
parameter to specify the orientation of the data. Common values for orient
include 'split'
, 'records'
, 'index'
, 'columns'
, and 'values'
.
You can use Pandas to directly read JSON data from an API. For example, the requests.get(url)
function sends a GET request to the specified API endpoint (url
). The response is then passed to pd.read_json()
, which reads the JSON data into a Pandas DataFrame. Finally, you can print or manipulate the DataFrame as needed.
If your JSON file contains datetime information, you can use the convert_dates
parameter to specify which columns should be converted to datetime format.
If your JSON file is compressed, such as in gzip format, you can still use Pandas to read it by specifying the compression type using the compression
parameter in the pd.read_json()
function.
When reading a JSON file using Pandas, you can control how missing data (null or NaN values) is handled using the pd.read_json()
function’s orient
and convert_axes
parameters.
Conclusion
In this article, I have explained how to read or load JSON string or file into pandas DataFrame. One of the most important param to be aware of is orient
which specifies the format of the JSON you are trying to load.
Related Articles
- pandas ExcelWriter Usage with Examples
- Read Excel file into pandas DataFrame
- Pandas – Convert JSON to CSV
- Pandas read_csv() with Examples
- Pandas Write DataFrame to CSV
- Pandas Convert JSON to DataFrame
- Create Pandas DataFrame With Examples
- How to Read CSV from String in Pandas
- Pandas Create DataFrame From Dict (Dictionary)
- Pandas Convert List of Dictionaries to DataFrame
- Pandas Read Multiple CSV Files into DataFrame
- How to read CSV without headers in pandas
- Export Pandas to CSV without Index & Header