Pandas Read Text with Examples

One can read a text file (txt) by using the pandas read_fwf() function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv() function.

In this article, I will explain how to read a text file line-by-line and convert it into pandas DataFrame with examples like reading a variable-length file, fixed-length file e.t.c

When reading fixed-length text files, you need to specify fixed width positions to split into separate columns.

1. read_fwf() Syntax

Following is the syntax of the read_fwf() function.


# Syntax of read_fwf()
pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)

It also supports additional params, refer to IO Tools.

2. Read Text File

First, let’s learn how to read unstructured free plain text from .txt file into DataFrame by using read_fwf() function. Though this function is meant to read fixed-length files, you can also use it to read the free plain text files.

pandas read text file
Text File

You can find the file I used here at Github proverbs.txt.


import pandas as pd

# read text file
df = pd.read_fwf('/Users/admin/apps/proverbs.txt')
print(df)

# Outputs
#  People who are similar spend time together
#0        A picture is worth a thousand words
#1        All good things must come to an end
#2       Beauty is in the eye of the beholder

3. Read Fixed Length File

Fixed length files contain data in a constant length for each field and record, each record in a text contains the same length. The parser is required to extract the data in the same position in each record. pandas read_fwf() parses exactly in this manner by taking the start and end position for each field.

pandas read fixed text
Fixed Length Text File

On the above text file, I have 4 fields, names – courses (length 6 ), fee (length 5), duration (length 7) and duration (length 4)


# Read Fixed length text file
import pandas as pd
cols = ['Courses','Fee','Duration','Discount']
df = pd.read_fwf('/Users/admin/apps/fixed-length.txt', 
                 header=None,widths=[6,5,7,4],
                 names=cols)
print(df)

# Outputs
#  Courses    Fee Duration  Discount
#0   Spark  25000  50 Days      2000
#1  Pandas  20000  35 Days      1000
#2    Java  15000      NaN       800
#3  Python  15000  30 Days       500
#4     PHP  18000  30 Days       800

4. Using read_csv()

If you have a text file with comma delimiter use pandas.read_csv() and to read tab delimiter (\t) file use read_table(). Besides these, you can also use pipe or any custom delimiter/separator.

pandas read txt
Comma delimited text file

# Ignore header and assign new columns
columns = ['courses','course_fee','course_duration','course_discount']
df = pd.read_csv('/Users/admin/apps/courses.csv', header=None,names=columns,skiprows=1)
print(df)

# Outputs
#  courses  course_fee course_duration  course_discount
#0   Spark       25000         50 Days             2000
#1  Pandas       20000         35 Days             1000
#2    Java       15000             NaN              800
#3  Python       15000         30 Days              500
#4     PHP       18000         30 Days              800

Conclusion

In this article, you have learned how to read a fixed length and variable length text/txt file line by line by using read_fwf() and read_csv() functions.

You May Also Like Reading

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas Read Text with Examples