Site icon Spark By {Examples}

Pandas Read Text with Examples

pandas read text

One can read a text file (txt) by using the pandas read_fwf() function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv() function.

In this article, I will explain how to read a text file line-by-line and convert it into pandas DataFrame with examples like reading a variable-length file, fixed-length file e.t.c.

When reading fixed-length text files, you need to specify fixed width positions to split into separate columns.

1. read_fwf() Syntax

Following is the syntax of the read_fwf() function.


# Syntax of read_fwf()
pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)

It also supports additional params, refer to IO Tools.

2. Read Text File

First, let’s learn how to read unstructured free plain text from .txt file into DataFrame by using read_fwf() function. Though this function is meant to read fixed-length files, you can also use it to read the free plain text files.

pandas read text file
Text File

You can find the file I used here at Github proverbs.txt.


import pandas as pd

# Read text file
df = pd.read_fwf('/Users/admin/apps/proverbs.txt')
print(df)

# Output:
#  People who are similar spend time together
# 0        A picture is worth a thousand words
# 1        All good things must come to an end
# 2       Beauty is in the eye of the beholder

3. Read Fixed Length File

Fixed length files contain data in a constant length for each field and record, each record in a text contains the same length. The parser is required to extract the data in the same position in each record. pandas read_fwf() parses exactly in this manner by taking the start and end position for each field.

pandas read fixed text
Fixed Length Text File

On the above text file, I have 4 fields, names – courses (length 6 ), fee (length 5), duration (length 7) and duration (length 4)


# Read Fixed length text file
import pandas as pd
cols = ['Courses','Fee','Duration','Discount']
df = pd.read_fwf('/Users/admin/apps/fixed-length.txt', 
                 header=None,widths=[6,5,7,4],
                 names=cols)
print(df)

# Output:
#  Courses    Fee Duration  Discount
# 0   Spark  25000  50 Days      2000
# 1  Pandas  20000  35 Days      1000
# 2    Java  15000      NaN       800
# 3  Python  15000  30 Days       500
# 4     PHP  18000  30 Days       800

4. Using read_csv()

If you have a text file with comma delimiter use pandas.read_csv() and to read tab delimiter (\t) file use read_table(). Besides these, you can also use pipe or any custom delimiter/separator.

pandas read txt
Comma delimited text file

# Ignore header and assign new columns
columns = ['courses','course_fee','course_duration','course_discount']
df = pd.read_csv('/Users/admin/apps/courses.csv', header=None,names=columns,skiprows=1)
print(df)

# Output:
#  courses  course_fee course_duration  course_discount
# 0   Spark       25000         50 Days             2000
# 1  Pandas       20000         35 Days             1000
# 2     Java       15000             NaN              800
# 3  Python       15000         30 Days              500
# 4     PHP       18000         30 Days              800

Frequently Asked Questions on Pandas Read Text with Examples

How do I read a text file into a Pandas DataFrame?

To read a text file into a Pandas DataFrame, you can use the pd.read_csv() function, even if your file is not a CSV. The read_csv() function is versatile and can handle various delimiter-separated values, including tabs, spaces, and other custom separators.

How can I read a CSV file with a custom delimiter?

To read a CSV file with a custom delimiter in Pandas, you can use the pd.read_csv() function and specify the delimiter using the delimiter or sep parameter. By default, pd.read_csv() assumes that the delimiter is a comma (,), but you can change it to your custom delimiter.

How do I read a text file with a specific encoding?

To read a text file with a specific encoding in Pandas, you can use the encoding parameter of the pd.read_csv() function. The encoding parameter allows you to specify the character encoding of the text file.

How can I skip a specific number of rows while reading a text file?

You can skip a specific number of rows while reading a text file in Pandas using the skiprows parameter in the pd.read_csv() function. The skiprows parameter allows you to specify the number of rows at the beginning of the file to be skipped.

Can Pandas read other text-based formats like JSON, Excel, or SQL?

Pandas supports reading various formats. For JSON, you can use pd.read_json(), for Excel pd.read_excel(), and for SQL pd.read_sql(). The usage is similar to read_csv().

How can I read a text file with a header that is not in the first row?

If your text file has a header that is not in the first row, you can use the header parameter of the pd.read_csv() function in Pandas to specify the row number to use as column names.

Conclusion

In this article, you have learned how to read a fixed length and variable length text/txt file line by line by using read_fwf() and read_csv() functions.

References

Exit mobile version