One can read a text file (txt) by using the pandas read_fwf()
function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv() function.
In this article, I will explain how to read a text file line-by-line and convert it into pandas DataFrame with examples like reading a variable-length file, fixed-length file e.t.c.
When reading fixed-length text files, you need to specify fixed width positions to split into separate columns.
1. read_fwf() Syntax
Following is the syntax of the read_fwf() function.
# Syntax of read_fwf()
pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)
It also supports additional params, refer to IO Tools.
2. Read Text File
First, let’s learn how to read unstructured free plain text from .txt
file into DataFrame by using read_fwf()
function. Though this function is meant to read fixed-length files, you can also use it to read the free plain text files.
You can find the file I used here at Github proverbs.txt.
import pandas as pd
# Read text file
df = pd.read_fwf('/Users/admin/apps/proverbs.txt')
print(df)
# Output:
# People who are similar spend time together
# 0 A picture is worth a thousand words
# 1 All good things must come to an end
# 2 Beauty is in the eye of the beholder
3. Read Fixed Length File
Fixed length files contain data in a constant length for each field and record, each record in a text contains the same length. The parser is required to extract the data in the same position in each record. pandas read_fwf()
parses exactly in this manner by taking the start and end position for each field.
On the above text file, I have 4 fields, names – courses
(length 6 ), fee
(length 5), duration
(length 7) and duration
(length 4)
# Read Fixed length text file
import pandas as pd
cols = ['Courses','Fee','Duration','Discount']
df = pd.read_fwf('/Users/admin/apps/fixed-length.txt',
header=None,widths=[6,5,7,4],
names=cols)
print(df)
# Output:
# Courses Fee Duration Discount
# 0 Spark 25000 50 Days 2000
# 1 Pandas 20000 35 Days 1000
# 2 Java 15000 NaN 800
# 3 Python 15000 30 Days 500
# 4 PHP 18000 30 Days 800
4. Using read_csv()
If you have a text file with comma delimiter use pandas.read_csv()
and to read tab delimiter (\t) file use read_table()
. Besides these, you can also use pipe or any custom delimiter/separator.
# Ignore header and assign new columns
columns = ['courses','course_fee','course_duration','course_discount']
df = pd.read_csv('/Users/admin/apps/courses.csv', header=None,names=columns,skiprows=1)
print(df)
# Output:
# courses course_fee course_duration course_discount
# 0 Spark 25000 50 Days 2000
# 1 Pandas 20000 35 Days 1000
# 2 Java 15000 NaN 800
# 3 Python 15000 30 Days 500
# 4 PHP 18000 30 Days 800
Frequently Asked Questions on Pandas Read Text with Examples
To read a text file into a Pandas DataFrame, you can use the pd.read_csv()
function, even if your file is not a CSV. The read_csv()
function is versatile and can handle various delimiter-separated values, including tabs, spaces, and other custom separators.
To read a CSV file with a custom delimiter in Pandas, you can use the pd.read_csv()
function and specify the delimiter using the delimiter
or sep
parameter. By default, pd.read_csv()
assumes that the delimiter is a comma (,
), but you can change it to your custom delimiter.
To read a text file with a specific encoding in Pandas, you can use the encoding
parameter of the pd.read_csv()
function. The encoding
parameter allows you to specify the character encoding of the text file.
You can skip a specific number of rows while reading a text file in Pandas using the skiprows
parameter in the pd.read_csv()
function. The skiprows
parameter allows you to specify the number of rows at the beginning of the file to be skipped.
Pandas supports reading various formats. For JSON, you can use pd.read_json()
, for Excel pd.read_excel()
, and for SQL pd.read_sql()
. The usage is similar to read_csv()
.
If your text file has a header that is not in the first row, you can use the header
parameter of the pd.read_csv()
function in Pandas to specify the row number to use as column names.
Conclusion
In this article, you have learned how to read a fixed length and variable length text/txt file line by line by using read_fwf()
and read_csv()
functions.
Related Articles
- pandas write CSV file
- Pandas Read TSV with Examples
- Read Excel file into pandas DataFrame
- Pandas Read JSON File with Examples
- Calculate Summary Statistics in Pandas
- pandas ExcelWriter Usage with Examples
- Pandas Explode Multiple Columns
- Pandas Handle Missing Data in Dataframe
- Pandas Add Column with Default Value
- How to read multiple CSV file into pandas DataFrame
- Pandas Combine Two Columns of Text in DataFrame
- Pandas Read Multiple CSV Files into DataFrame
- Pandas Difference Between Two DataFrames
References
- https://www.washington.edu/admin/adminsystems/fastrans/fixed.html#:~:text=Fixed%20length%20files%20have%20a,same%20position%20in%20each%20record.&text=For%20this%20reason%20use%20of,Length%20Files%20is%20NOT%20recommended.
- https://www.ibm.com/docs/en/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_fixed_length_format_about.html