Question

I'm trying to read a file. I want to parse the lines of the file as a dictionary, but I can't seem to get that part into my array.

My file looks like:

Records:

2014-05-14,12:16:26,subject,{MSGTYPE="Personal" NAME="Fred" ADDRESS="Flat1" AGE=92 GENDER="M"}

2014-05-15,14:36:26,subject,{MSGTYPE="Personal" NAME="George" ADDRESS="Flat2"       AGE=-20 GENDER="M"}

2014-05-13,16:49:26,subject,{MSGTYPE="Personal" NAME="Ringo" ADDRESS="Flat3"    AGE=-36 GENDER="M"}

2014-05-12,14:45:26,subject,{MSGTYPE="Personal" NAME="Brian" ADDRESS="Flat4" AGE=-85 GENDER="M"}

2014-05-11,12:43:26,subject,{MSGTYPE="Personal" NAME="Paul" ADDRESS="Flat5" AGE=-33 GENDER="M"}

So the plan is to split it by ','. Then take value 4 and place it into its own dictionary. BUT I'm doing something wrong with the split.

valuesArray = []
f = open(rvfile)
    for line in f:
        if not line.startswith('**Records**'):
            valuesArray = line.split(',')
            print '1: {0}'.format(valuesArray[0])
            print '2: {1}'.format(valuesArray[1])         

I am getting the error:

Traceback (most recent call last):
    File "FAST_RV_Tests.py", line 70, in <module>
IndexError: index out of range: 1

The first print is returning '1: 2014-05-14' as I'd expect. But there is nothing else in the array.

Was it helpful?

Solution

You are likely to have empty lines in your data file, and splitting such line will not return list with enough items present.

In your loop you shall call continue if you meet an empty line.

Another hint is to call split with 2nd argument telling, how many splits shall be applied. This shall allow getting the final JSON part and you may use json.loads on it to get the content

If I modify your data to reflect your statement about having JSON data at the end,

014-05-14,12:16:26,subject,{"MSGTYPE":"Personal", "NAME":"Fred", "ADDRESS":"Flat1", "AGE": 92, "GENDER":"M"}

2014-05-15,14:36:26,subject,{"MSGTYPE":"Personal", "NAME":"George", "ADDRESS":"Flat2", "AGE": -20, "GENDER":"M"}

2014-05-13,16:49:26,subject,{"MSGTYPE":"Personal", "NAME":"Ringo", "ADDRESS":"Flat3", "AGE": -36, "GENDER":"M"}

2014-05-12,14:45:26,subject,{"MSGTYPE":"Personal", "NAME":"Brian", "ADDRESS":"Flat4", "AGE": -85, "GENDER":"M"}

2014-05-11,12:43:26,subject,{"MSGTYPE":"Personal", "NAME":"Paul", "ADDRESS":"Flat5", "AGE": -33, "GENDER":"M"}

it would work like this:

import json
fname = "data.txt"
with open(fname) as f:
    for line in f:
        line = line.strip()
        if len(line) == 0:
            continue
        if line.startswith('**Records**'):
            continue
        valuesArray = line.split(',', 3)
        y, d = valuesArray[:2]
        print '1: {y}'.format(y=y)
        print '2: {d}'.format(d=d)    
        # bonus, read the json data
        print valuesArray[3]

        jsdata = json.loads(valuesArray[3])
        print "jsdata", jsdata

OTHER TIPS

Your error is on your second format line:

print '2: {1}'.format(valuesArray[1])  

You are only formatting one value, so the {1} should be {0}

The proper use of {1} would be if you had something like this:

print "1: {0} {1}".format(valuesArray[0], valuesArray[1])

Should print '2: {1}'.format(valuesArray[1]) not be print '2: {0}'.format(valuesArray[1])?

There is only one argument in the formatting, index 1 is out of range.

If it is the case that your real input file has blank lines between the records like your example has, then that would probably explain why your split does not produce any values. Also, be aware that line will contain a trailing newline character, so you may want to call line.strip() inside your loop.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top