Pandas Read Text with Examples

  • Post author:
  • Post category:Pandas
  • Post last modified:October 5, 2023

One can read a text file (txt) by using the pandas read_fwf() function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv() function.

In this article, I will explain how to read a text file line-by-line and convert it into pandas DataFrame with examples like reading a variable-length file, fixed-length file e.t.c

When reading fixed-length text files, you need to specify fixed width positions to split into separate columns.

1. read_fwf() Syntax

Following is the syntax of the read_fwf() function.


# Syntax of read_fwf()
pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)

It also supports additional params, refer to IO Tools.

2. Read Text File

First, let’s learn how to read unstructured free plain text from .txt file into DataFrame by using read_fwf() function. Though this function is meant to read fixed-length files, you can also use it to read the free plain text files.

pandas read text file
Text File

You can find the file I used here at Github proverbs.txt.


import pandas as pd

# Read text file
df = pd.read_fwf('/Users/admin/apps/proverbs.txt')
print(df)

# Output:
#  People who are similar spend time together
# 0        A picture is worth a thousand words
# 1        All good things must come to an end
# 2       Beauty is in the eye of the beholder

3. Read Fixed Length File

Fixed length files contain data in a constant length for each field and record, each record in a text contains the same length. The parser is required to extract the data in the same position in each record. pandas read_fwf() parses exactly in this manner by taking the start and end position for each field.

pandas read fixed text
Fixed Length Text File

On the above text file, I have 4 fields, names – courses (length 6 ), fee (length 5), duration (length 7) and duration (length 4)


# Read Fixed length text file
import pandas as pd
cols = ['Courses','Fee','Duration','Discount']
df = pd.read_fwf('/Users/admin/apps/fixed-length.txt', 
                 header=None,widths=[6,5,7,4],
                 names=cols)
print(df)

# Output:
#  Courses    Fee Duration  Discount
# 0   Spark  25000  50 Days      2000
# 1  Pandas  20000  35 Days      1000
# 2    Java  15000      NaN       800
# 3  Python  15000  30 Days       500
# 4     PHP  18000  30 Days       800

4. Using read_csv()

If you have a text file with comma delimiter use pandas.read_csv() and to read tab delimiter (\t) file use read_table(). Besides these, you can also use pipe or any custom delimiter/separator.

pandas read txt
Comma delimited text file

# Ignore header and assign new columns
columns = ['courses','course_fee','course_duration','course_discount']
df = pd.read_csv('/Users/admin/apps/courses.csv', header=None,names=columns,skiprows=1)
print(df)

# Output:
#  courses  course_fee course_duration  course_discount
# 0   Spark       25000         50 Days             2000
# 1  Pandas       20000         35 Days             1000
# 2     Java       15000             NaN              800
# 3  Python       15000         30 Days              500
# 4     PHP       18000         30 Days              800

Conclusion

In this article, you have learned how to read a fixed length and variable length text/txt file line by line by using read_fwf() and read_csv() functions.

References

Naveen

I am a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, I have honed my expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. My journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. I have started this SparkByExamples.com to share my experiences with the data as I come across. You can learn more about me at LinkedIn

Leave a Reply

You are currently viewing Pandas Read Text with Examples