• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Read Text with Examples

One can read a text file (txt) by using the pandas read_fwf() function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv() function.

In this article, I will explain how to read a text file line-by-line and convert it into pandas DataFrame with examples like reading a variable-length file, fixed-length file e.t.c.

When reading fixed-length text files, you need to specify fixed width positions to split into separate columns.

1. read_fwf() Syntax

Following is the syntax of the read_fwf() function.


# Syntax of read_fwf()
pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)

It also supports additional params, refer to IO Tools.

2. Read Text File

First, let’s learn how to read unstructured free plain text from .txt file into DataFrame by using read_fwf() function. Though this function is meant to read fixed-length files, you can also use it to read the free plain text files.

pandas read text file
Text File

You can find the file I used here at Github proverbs.txt.


import pandas as pd

# Read text file
df = pd.read_fwf('/Users/admin/apps/proverbs.txt')
print(df)

# Output:
#  People who are similar spend time together
# 0        A picture is worth a thousand words
# 1        All good things must come to an end
# 2       Beauty is in the eye of the beholder

3. Read Fixed Length File

Fixed length files contain data in a constant length for each field and record, each record in a text contains the same length. The parser is required to extract the data in the same position in each record. pandas read_fwf() parses exactly in this manner by taking the start and end position for each field.

pandas read fixed text
Fixed Length Text File

On the above text file, I have 4 fields, names – courses (length 6 ), fee (length 5), duration (length 7) and duration (length 4)


# Read Fixed length text file
import pandas as pd
cols = ['Courses','Fee','Duration','Discount']
df = pd.read_fwf('/Users/admin/apps/fixed-length.txt', 
                 header=None,widths=[6,5,7,4],
                 names=cols)
print(df)

# Output:
#  Courses    Fee Duration  Discount
# 0   Spark  25000  50 Days      2000
# 1  Pandas  20000  35 Days      1000
# 2    Java  15000      NaN       800
# 3  Python  15000  30 Days       500
# 4     PHP  18000  30 Days       800

4. Using read_csv()

If you have a text file with comma delimiter use pandas.read_csv() and to read tab delimiter (\t) file use read_table(). Besides these, you can also use pipe or any custom delimiter/separator.

pandas read txt
Comma delimited text file

# Ignore header and assign new columns
columns = ['courses','course_fee','course_duration','course_discount']
df = pd.read_csv('/Users/admin/apps/courses.csv', header=None,names=columns,skiprows=1)
print(df)

# Output:
#  courses  course_fee course_duration  course_discount
# 0   Spark       25000         50 Days             2000
# 1  Pandas       20000         35 Days             1000
# 2     Java       15000             NaN              800
# 3  Python       15000         30 Days              500
# 4     PHP       18000         30 Days              800

Frequently Asked Questions on Pandas Read Text with Examples

How do I read a text file into a Pandas DataFrame?

To read a text file into a Pandas DataFrame, you can use the pd.read_csv() function, even if your file is not a CSV. The read_csv() function is versatile and can handle various delimiter-separated values, including tabs, spaces, and other custom separators.

How can I read a CSV file with a custom delimiter?

To read a CSV file with a custom delimiter in Pandas, you can use the pd.read_csv() function and specify the delimiter using the delimiter or sep parameter. By default, pd.read_csv() assumes that the delimiter is a comma (,), but you can change it to your custom delimiter.

How do I read a text file with a specific encoding?

To read a text file with a specific encoding in Pandas, you can use the encoding parameter of the pd.read_csv() function. The encoding parameter allows you to specify the character encoding of the text file.

How can I skip a specific number of rows while reading a text file?

You can skip a specific number of rows while reading a text file in Pandas using the skiprows parameter in the pd.read_csv() function. The skiprows parameter allows you to specify the number of rows at the beginning of the file to be skipped.

Can Pandas read other text-based formats like JSON, Excel, or SQL?

Pandas supports reading various formats. For JSON, you can use pd.read_json(), for Excel pd.read_excel(), and for SQL pd.read_sql(). The usage is similar to read_csv().

How can I read a text file with a header that is not in the first row?

If your text file has a header that is not in the first row, you can use the header parameter of the pd.read_csv() function in Pandas to specify the row number to use as column names.

Conclusion

In this article, you have learned how to read a fixed length and variable length text/txt file line by line by using read_fwf() and read_csv() functions.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium