• Post author:
  • Post category:Pandas
  • Post last modified:December 1, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Read SQL Query or Table with Examples

Pandas read_sql() function is used to read data from SQL queries or database tables into DataFrame. This function allows you to execute SQL queries and load the results directly into a Pandas DataFrame. This function is a wrapper for the read_sql_query() and read_sql_table() functions, based on the input, it calls these functions internally and returns the SQL table as a two-dimensional data structure with labeled axes.

Advertisements

I will use the following steps to explain Pandas read_sql() usage.

  1. Syntax of read_sql()
  2. Create database table on sqlite3
  3. Insert data into the table
  4. Execute SQL query by using pands red_sql().
  5. Use read_sql_query()
  6. Use read_sql_table()
  7. Filter rows from SQL table

Syntax of Pandas read_sql()

Following are the syntax of read_sql(), read_sql_query() and read_sql_table() functions. All these functions return either DataFrame or Iterator[DataFrame].


# Syntax of read_sql()
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

# Syntax of read_sql_query()
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)

# Syntax of read_sql_table()
pandas.read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize=None)

Parameters:

  • sql (str or SQLAlchemy Selectable) – If it’s a string, it can be a SQL query or a table name.If it’s a SQLAlchemy Selectable (like a table or a select statement), it will be used as the query to execute.
  • con (SQLAlchemy engine or database connection) – A database connection or a SQLAlchemy engine. This is required to connect to the database.
  • index_col (str, optional) – Column to set as the DataFrame index. If None, no index will be set.
  • coerce_float (bool, default True, optional) – Whether to coerce non-string, non-numeric objects (like datetime) to float values.
  • params (tuple, list, or dict, optional) – Parameters to pass into the query. Useful for parameterized queries to avoid SQL injection.
  • parse_dates (bool or list of str, optional) – Columns to parse as dates. Can be a list of column names to parse specific columns or True to parse all columns that look like dates.
  • columns(list, optional) – If specified, limits the columns to be returned from the SQL query.

Before we go into learning how to use pandas read_sql() and other functions, let’s create a database and table by using sqlite3.

Create a Database and Table

The below example can be used to create a database and table in Python by using the sqlite3 library. If you don’t have a sqlite3 library install it using the pip command. In order to use it first, you need to import it.


import sqlite3

# Connect to Database
con = sqlite3.connect('courses_database') 
cur = con.cursor()

# Create Table
cur.execute('''CREATE TABLE IF NOT EXISTS COURSES
          ([course_id] INTEGER PRIMARY KEY, 
           [course_name] TEXT, 
           [fee] INTEGER, 
           [duration] TEXT, 
           [discount] INTEGER)''')

Running the above script creates a new database called courses_database along with a table named courses.

Insert Rows into the Table

Now insert rows into the table by using execute() function of the Cursor object. Don’t forget to run the commit(), this saves the inserted rows into the database permanently.


# Insert Few Records
cur.execute('''INSERT INTO COURSES (course_id, course_name, fee, duration, discount)
                VALUES (1,'Spark',25000,'50 Days', 2000),
                (2,'Pandas',20000,'35 Days', 1000),
                (3,'Java',15000,'35 Days', 800),
                (4,'Python',15000,'30 Days', 500),
                (5,'PHP',28000,'30 Days', 800)''') 
                    
# Commit inserts
con.commit()

Using pandas read_sql() query

Now by using Pandas read_sql() function load the table, as I said above, this can take either SQL query or table name as a parameter. since we are passing SQL query as the first param, it internally calls read_sql_query() function.


import sqlite3
import pandas as pd

# Connect to the database
con = sqlite3.connect('courses_database') 

# Run SQL          
sql_query = pd.read_sql('SELECT * FROM COURSES', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields below output.


# Output:
  course_id course_name    fee duration  discount
0         1       Spark  25000  50 Days      2000
1         2      Pandas  20000  35 Days      1000
2         3        Java  15000      NaN       800
3         4      Python  15000  30 Days       500
4         5         PHP  18000  30 Days       800

Using read_sql_query()

Similarly, you can also write the above statement directly by using the read_sql_query() function.


# Run SQL          
sql_query = pd.read_sql_query('SELECT * FROM COURSES', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields the same output as above.

Using read_sql_table()

In the above examples, I have used SQL queries to read the table into pandas DataFrame. Now let’s just use the table name to load the entire table using the read_sql_table() function. This loads all rows from the table into DataFrame.


# Read SQL table       
sql_query = pd.read_sql('COURSES', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields the same output.

Filter Rows from SQL

Most of the time you may not be required to read all rows from the SQL table, to load only selected rows based on a condition that use SQL with Where Clause.


# Run SQL          
sql_query = pd.read_sql_query('''SELECT * FROM COURSES where fee = 1500''', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields below output.


# Output:
  course_id course_name    fee duration  discount
0         3        Java  15000      NaN       800
1         4      Python  15000  30 Days       500

Using Cursor fetchall()

Alternatively, you can also use the DataFrame constructor along with Cursor.fetchall() to load the SQL table into DataFrame. The below example yields the same output as above.


# Run SQL 
cur = con.cursor()         
cur.execute('''SELECT * FROM COURSES where fee = 1500''')

# Convert SQL to DataFrame
df = pd.DataFrame(cur.fetchall(), columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

FAQ on Pandas Read SQL Query or Table

What is read_sql_query in Pandas?

The pandas.read_sql_query() function is used to execute a SQL query and return the result as a DataFrame.

What is read_sql_table in Pandas?

The pandas.read_sql_table() function fetches the entire contents of a table in a database as a DataFrame.

What are the required parameters for these functions?

sql: A query string (for read_sql_query) or table name (for read_sql_table).
con: A SQLAlchemy connection object or database connection string.

How do I connect to a database?

To connect to a database in Pandas, you typically use SQLAlchemy, which provides a way to interact with many types of databases.

Can I use parameters in SQL queries?

You can use parameters in SQL queries when reading data into a Pandas DataFrame. Using parameters helps protect against SQL injection and allows for dynamic queries. Pandas supports parameterized queries through the params argument in both read_sql_query() and read_sql_table() functions.

Conclusion

In this pandas read SQL into DataFrame you have learned how to run the SQL query and convert the result into DataFrame. Also learned how to read an entire database table, only selected rows etc.

References