Introduction
Fixed width files, which do not have any column delimiters are common in financial industry especially with ETL extracting data from mainframe systems. In this article I will explain how to read a fixed width file using pandas in python.
What is a Fixed-Width file
A fixed width file, a relic of the mainframe era, is a file without any column delimiter. However it does have a newline character as its row delimiter. Without column delimiter you need to know the start and end position of each column.
Why use a fixed width file?
In some situations they can result in high speed and low resource consumption. But you need to fill it with some spaces or junk characters if the actual value is missing or is not of the defined width.
Any unused character in a column is left blank and next column only starts from its predefined starting position.
Example file used in this article. It has three columns – ID (2), NAME(50) and CITY(8).
Column positions
You need to know the start and end position for each column beforehand in order to correctly read a fixed width file. Once you have the column positions, you can store them in a python list as a tuple pair for each column.
Alternatively you can also use a dictionary, where the key is the column name and value is the tuple pair. In this example I am just using another list for column names
Read_fwf function
Pandas makes it very easy to ready a fixed width file by providing a inbuilt function called read_fwf. All you need is to provide the filename and column positions. You can read up the documentation for exploring more parameters of this function
https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html
Complete code
import pandas as pd filepath = 'fwf_sample' col_positions = [(0,2),(2,51),(51,60)] #Column1 is 2 characters length, column2 is 50 chars long and column3 is 8 chars long col_names = ['Id','Name','City'] data = pd.read_fwf(filepath, colspecs=col_positions,header=None) data.columns = col_names data.head()
Conclusion
There are other ways also to read fixed width files which do no use Pandas and rely on other python libraries and may be faster than this method. But for the purposes of data scientists getting your data in a dataframe is of high importance. Hence this method will be very useful for them.
Do let know in the comments how you ready the fixed width files. See you in the next article.