• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:10 mins read
You are currently viewing Pandas Read SQL Query or Table with Examples

Pandas read_sql() function is used to read data from SQL queries or database tables into DataFrame. This function allows you to execute SQL queries and load the results directly into a Pandas DataFrame. This function is a wrapper for the read_sql_query() and read_sql_table() functions, based on the input, it calls these functions internally and returns the SQL table as a two-dimensional data structure with labeled axes.

Advertisements

I will use the following steps to explain Pandas read_sql() usage.

  1. Syntax of read_sql()
  2. Create database table on sqlite3
  3. Insert data into the table
  4. Execute SQL query by using pands red_sql().
  5. Use read_sql_query()
  6. Use read_sql_table()
  7. Filter rows from SQL table

1. Syntax of Pandas read_sql()

Following are the syntax of read_sql(), read_sql_query() and read_sql_table() functions. All these functions return either DataFrame or Iterator[DataFrame].


# Syntax of read_sql()
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

# Syntax of read_sql_query()
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)

# Syntax of read_sql_table()
pandas.read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize=None)

Parameters:

  • sql
  • con
  • index_col
  • coerce_float
  • params
  • parse_dates
  • columns
  • chuynksize

Before we go into learning how to use pandas read_sql() and other functions, let’s create a database and table by using sqlite3.

2. Create a Database and Table

The below example can be used to create a database and table in Python by using the sqlite3 library. If you don’t have a sqlite3 library install it using the pip command. In order to use it first, you need to import it.


import sqlite3

# Connect to Database
con = sqlite3.connect('courses_database') 
cur = con.cursor()

# Create Table
cur.execute('''CREATE TABLE IF NOT EXISTS COURSES
          ([course_id] INTEGER PRIMARY KEY, 
           [course_name] TEXT, 
           [fee] INTEGER, 
           [duration] TEXT, 
           [discount] INTEGER)''')

Running the above script creates a new database called courses_database along with a table named courses.

3. Insert Rows into the Table

Now insert rows into the table by using execute() function of the Cursor object. Don’t forget to run the commit(), this saves the inserted rows into the database permanently.


# Insert Few Records
cur.execute('''INSERT INTO COURSES (course_id, course_name, fee, duration, discount)
                VALUES (1,'Spark',25000,'50 Days', 2000),
                (2,'Pandas',20000,'35 Days', 1000),
                (3,'Java',15000,'35 Days', 800),
                (4,'Python',15000,'30 Days', 500),
                (5,'PHP',28000,'30 Days', 800)''') 
                    
# Commit Inserts
con.commit()

4. Using pandas read_sql() query

Now by using Pandas read_sql() function load the table, as I said above, this can take either SQL query or table name as a parameter. since we are passing SQL query as the first param, it internally calls read_sql_query() function.


import sqlite3
import pandas as pd

# Connect to the database
con = sqlite3.connect('courses_database') 

# Run SQL          
sql_query = pd.read_sql('SELECT * FROM COURSES', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields below output.


# Output:
  course_id course_name    fee duration  discount
0         1       Spark  25000  50 Days      2000
1         2      Pandas  20000  35 Days      1000
2         3        Java  15000      NaN       800
3         4      Python  15000  30 Days       500
4         5         PHP  18000  30 Days       800

5. Using read_sql_query()

Similarly, you can also write the above statement directly by using the read_sql_query() function.


# Run SQL          
sql_query = pd.read_sql_query('SELECT * FROM COURSES', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields the same output as above.

6. Using read_sql_table()

In the above examples, I have used SQL queries to read the table into pandas DataFrame. Now let’s just use the table name to load the entire table using the read_sql_table() function. This loads all rows from the table into DataFrame.


# Read SQL table       
sql_query = pd.read_sql('COURSES', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields the same output.

7. Filter Rows from SQL

Most of the time you may not be required to read all rows from the SQL table, to load only selected rows based on a condition that use SQL with Where Clause.


# Run SQL          
sql_query = pd.read_sql_query('''SELECT * FROM COURSES where fee = 1500''', con)

# Convert SQL to DataFrame
df = pd.DataFrame(sql_query, columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Yields below output.


# Output:
  course_id course_name    fee duration  discount
0         3        Java  15000      NaN       800
1         4      Python  15000  30 Days       500

8. Using Cursor fetchall()

Alternatively, you can also use the DataFrame constructor along with Cursor.fetchall() to load the SQL table into DataFrame. The below example yields the same output as above.


# Run SQL 
cur = con.cursor()         
cur.execute('''SELECT * FROM COURSES where fee = 1500''')

# Convert SQL to DataFrame
df = pd.DataFrame(cur.fetchall(), columns = ['course_id', 'course_name', 'fee','duration','discount'])
print(df)

Conclusion

In this pandas read SQL into DataFrame you have learned how to run the SQL query and convert the result into DataFrame. Also learned how to read an entire database table, only selected rows etc.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium