First time poster, long-time lurker. Have searched high and low for an answer to this but it's got to that stage...!
I am having some trouble implementing the answer given by John Machin to this past question:
How to efficiently parse fixed width files?
At a very high level I am using this code to split up fixed format text files and import them into a PostgreSQL database. I have successfully used this code to implement the solution for one text file, however I am now trying to expand my program to work with different text files with different fixed formats, and am continuously running into the same error:
struct.error: unpack_from requires a buffer of at least [x] bytes
Of course, I get a different value for x depending on the format string I am feeding to the function - my problem is that it continues to work for one and only one format, and not any others. The only thing I am changing is the variable used to calculate the format string, and the variable names in the script which relate to the format.
So for example this works fine:
cnv_text = lambda s: str(s.strip())
cnv_int = lambda s: int(s) if s.isspace() is False else s.strip()
cnv_date_ymd = lambda s: datetime.datetime.strptime(s, '%Y%m%d') if s.isspace() is False else s.strip() # YYYY-MM-DD
unpack_len = 0
unpack_fmt = ""
splitData = []
conn = psycopg2.connect("[connection info]")
cur = conn.cursor()
Table1specs = [
('A', 6, 14, cnv_text),
('B', 20, 255, cnv_text),
('C', 275, 1, cnv_text),
('D', 276, 1, cnv_text),
('E', 277, 1, cnv_text),
('F', 278, 1, cnv_text),
('G', 279, 1, cnv_text),
('H', 280, 1, cnv_text),
('I', 281, 8, cnv_date_ymd),
('J', 289, 8, cnv_date_ymd),
('K', 297, 8, cnv_date_ymd),
('L', 305, 8, cnv_date_ymd),
('M', 313, 8, cnv_date_ymd),
('N', 321, 1, cnv_text),
('O', 335, 2, cnv_text),
('P', 337, 2, cnv_int),
('Q', 339, 5, cnv_int),
('R', 344, 255, cnv_text),
('S', 599, 1, cnv_int),
('T', 600, 1, cnv_int),
('U', 601, 5, cnv_int),
('V', 606, 10, cnv_text)
]
#for each column in the spec variable...
for column in Table1specs:
start = column[1] - 1
end = start + column[2]
if start > unpack_len:
unpack_fmt += str(start - unpack_len) + "x"
unpack_fmt += str(end - start) + "s"
unpack_len = end
field_indices = range(len(Table1specs))
print unpack_len, unpack_fmt
#set unpacker
unpacker = struct.Struct(unpack_fmt).unpack_from
class Record(object):
pass
filename = "Table1Data.txt"
f = open(filename, 'r')
for line in f:
raw_fields = unpacker(line)
r = Record()
for x in field_indices:
setattr(r, Table1specs[x][0], Table1specs[x][3](raw_fields[x]))
splitData.append(r.__dict__)
All the data is appended to splitData, which I then cycle through in a loop and work into SQL statements for input into the database via psycopg2. When I change the specs to something else (and the other variables also to reflect this), then I receive the error. It is thrown from the 'raw_fields = unpacker(line)' line.
I have exhausted all resources and am at a loose end... any thoughts or ideas welcomed.
(Could it be to do with the text file I am importing from?)
Best regards.