One can read a text file (txt) by using the pandas
read_fwf() function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv() function.
In this article, I will explain how to read a text file line-by-line and convert it into pandas DataFrame with examples like reading a variable-length file, fixed-length file e.t.c
When reading fixed-length text files, you need to specify fixed width positions to split into separate columns.
1. read_fwf() Syntax
Following is the syntax of the read_fwf() function.
# Syntax of read_fwf() pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)
It also supports additional params, refer to IO Tools.
2. Read Text File
First, let’s learn how to read unstructured free plain text from .txt file into DataFrame by using read_fwf() function. Though this function is meant to read fixed-length files, you can also use it to read the free plain text files.
You can find the file I used here at Github proverbs.txt.
import pandas as pd # read text file df = pd.read_fwf('/Users/admin/apps/proverbs.txt') print(df) # Outputs # People who are similar spend time together #0 A picture is worth a thousand words #1 All good things must come to an end #2 Beauty is in the eye of the beholder
3. Read Fixed Length File
Fixed length files contain data in a constant length for each field and record, each record in a text contains the same length. The parser is required to extract the data in the same position in each record. pandas read_fwf() parses exactly in this manner by taking the start and end position for each field.
On the above text file, I have 4 fields, names –
courses (length 6 ),
fee (length 5),
duration (length 7) and
duration (length 4)
# Read Fixed length text file import pandas as pd cols = ['Courses','Fee','Duration','Discount'] df = pd.read_fwf('/Users/admin/apps/fixed-length.txt', header=None,widths=[6,5,7,4], names=cols) print(df) # Outputs # Courses Fee Duration Discount #0 Spark 25000 50 Days 2000 #1 Pandas 20000 35 Days 1000 #2 Java 15000 NaN 800 #3 Python 15000 30 Days 500 #4 PHP 18000 30 Days 800
4. Using read_csv()
If you have a text file with comma delimiter use pandas.read_csv() and to read tab delimiter (\t) file use read_table(). Besides these, you can also use pipe or any custom delimiter/separator.
# Ignore header and assign new columns columns = ['courses','course_fee','course_duration','course_discount'] df = pd.read_csv('/Users/admin/apps/courses.csv', header=None,names=columns,skiprows=1) print(df) # Outputs # courses course_fee course_duration course_discount #0 Spark 25000 50 Days 2000 #1 Pandas 20000 35 Days 1000 #2 Java 15000 NaN 800 #3 Python 15000 30 Days 500 #4 PHP 18000 30 Days 800
In this article, you have learned how to read a fixed length and variable length text/txt file line by line by using read_fwf() and read_csv() functions.
- pandas ExcelWriter Usage with Examples
- pandas write CSV file
- Read Excel file into pandas DataFrame
- How to read multiple CSV file into pandas DataFrame
- Pandas Combine Two Columns of Text in DataFrame
- Calculate Summary Statistics in Pandas
- Pandas Handle Missing Data in Dataframe
- Pandas Read TSV with Examples
- Pandas Read Multiple CSV Files into DataFrame
- Pandas Read JSON File with Examples