Pandas read_sql()
function is used to read data from SQL queries or database tables into DataFrame. This function allows you to execute SQL queries and load the results directly into a Pandas DataFrame. This function is a wrapper for the read_sql_query()
and read_sql_table()
functions, based on the input, it calls these functions internally and returns the SQL table as a two-dimensional data structure with labeled axes.
I will use the following steps to explain Pandas read_sql() usage.
- Syntax of read_sql()
- Create database table on sqlite3
- Insert data into the table
- Execute SQL query by using pands red_sql().
- Use read_sql_query()
- Use read_sql_table()
- Filter rows from SQL table
Syntax of Pandas read_sql()
Following are the syntax of read_sql(), read_sql_query() and read_sql_table() functions. All these functions return either DataFrame or Iterator[DataFrame].
# Syntax of read_sql()
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)
# Syntax of read_sql_query()
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)
# Syntax of read_sql_table()
pandas.read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize=None)
Parameters:
sql
(str or SQLAlchemy Selectable) – If it’s a string, it can be a SQL query or a table name.If it’s a SQLAlchemy Selectable (like a table or aselect
statement), it will be used as the query to execute.con
(SQLAlchemy engine or database connection) – A database connection or a SQLAlchemy engine. This is required to connect to the database.index_col
(str, optional) – Column to set as the DataFrame index. If None, no index will be set.coerce_float
(bool, default True, optional) – Whether to coerce non-string, non-numeric objects (likedatetime
) tofloat
values.params
(tuple, list, or dict, optional) – Parameters to pass into the query. Useful for parameterized queries to avoid SQL injection.parse_dates
(bool or list of str, optional) – Columns to parse as dates. Can be a list of column names to parse specific columns or True to parse all columns that look like dates.columns
(list, optional) – If specified, limits the columns to be returned from the SQL query.
Before we go into learning how to use pandas read_sql()
and other functions, let’s create a database and table by using sqlite3
.
Create a Database and Table
The below example can be used to create a database and table in Python by using the sqlite3 library. If you don’t have a sqlite3 library install it using the pip command. In order to use it first, you need to import it.
import sqlite3
# Connect to Database
con = sqlite3.connect('courses_database')
cur = con.cursor()
# Create Table
cur.execute('''CREATE TABLE IF NOT EXISTS COURSES
([course_id] INTEGER PRIMARY KEY,
[course_name] TEXT,
[fee] INTEGER,
[duration] TEXT,
[discount] INTEGER)''')
Running the above script creates a new database called courses_database along with a table named courses.
Insert Rows into the Table
Now insert rows into the table by using execute()
function of the Cursor
object. Don’t forget to run the commit()
, this saves the inserted rows into the database permanently.
# Insert Few Records
cur.execute('''INSERT INTO COURSES (course_id, course_name, fee, duration, discount)
VALUES (1,'Spark',25000,'50 Days', 2000),
(2,'Pandas',20000,'35 Days', 1000),
(3,'Java',15000,'35 Days', 800),
(4,'Python',15000,'30 Days', 500),
(5,'PHP',28000,'30 Days', 800)''')
# Commit inserts
con.commit()
Using pandas read_sql() query
Now by using Pandas read_sql()
function load the table, as I said above, this can take either SQL query or table name as a parameter. since we are passing SQL query as the first param, it internally calls read_sql_query()
function.
import sqlite3
import pandas as pd
# Connect to the database
con = sqlite3.connect('courses_database')
# Run SQL
sql_query = pd.read_sql('SELECT * FROM COURSES', con)
# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)
Yields below output.
# Output:
course_id course_name fee duration discount
0 1 Spark 25000 50 Days 2000
1 2 Pandas 20000 35 Days 1000
2 3 Java 15000 NaN 800
3 4 Python 15000 30 Days 500
4 5 PHP 18000 30 Days 800
Using read_sql_query()
Similarly, you can also write the above statement directly by using the read_sql_query()
function.
# Run SQL
sql_query = pd.read_sql_query('SELECT * FROM COURSES', con)
# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)
Yields the same output as above.
Using read_sql_table()
In the above examples, I have used SQL queries to read the table into pandas DataFrame. Now let’s just use the table name to load the entire table using the read_sql_table()
function. This loads all rows from the table into DataFrame.
# Read SQL table
sql_query = pd.read_sql('COURSES', con)
# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)
Yields the same output.
Filter Rows from SQL
Most of the time you may not be required to read all rows from the SQL table, to load only selected rows based on a condition that use SQL with Where Clause
.
# Run SQL
sql_query = pd.read_sql_query('''SELECT * FROM COURSES where fee = 1500''', con)
# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)
Yields below output.
# Output:
course_id course_name fee duration discount
0 3 Java 15000 NaN 800
1 4 Python 15000 30 Days 500
Using Cursor fetchall()
Alternatively, you can also use the DataFrame constructor along with Cursor.fetchall()
to load the SQL table into DataFrame. The below example yields the same output as above.
# Run SQL
cur = con.cursor()
cur.execute('''SELECT * FROM COURSES where fee = 1500''')
# Convert SQL to DataFrame
df = pd.DataFrame(cur.fetchall(), columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)
FAQ on Pandas Read SQL Query or Table
The pandas.read_sql_query()
function is used to execute a SQL query and return the result as a DataFrame.
The pandas.read_sql_table()
function fetches the entire contents of a table in a database as a DataFrame.
sql
: A query string (for read_sql_query
) or table name (for read_sql_table
).con
: A SQLAlchemy connection object or database connection string.
To connect to a database in Pandas, you typically use SQLAlchemy, which provides a way to interact with many types of databases.
You can use parameters in SQL queries when reading data into a Pandas DataFrame. Using parameters helps protect against SQL injection and allows for dynamic queries. Pandas supports parameterized queries through the params
argument in both read_sql_query()
and read_sql_table()
functions.
Conclusion
In this pandas read SQL into DataFrame you have learned how to run the SQL query and convert the result into DataFrame. Also learned how to read an entire database table, only selected rows etc.
Related Articles
- Pandas Read TSV with Examples
- Pandas Read Text with Examples
- Pandas DataFrame fillna() function
- Pandas Read JSON File with Examples
- Pandas read_csv() with Examples
- How to Read CSV from String in Pandas
- Pandas Filter Rows Using IN Like SQL
- Pretty Print Pandas DataFrame or Series?
- Pandas Window Functions Explained
- Pandas Read Multiple CSV Files into DataFrame