Question

I am reading data from an xls spreadsheet with xlrd. First, I gather the index for the column that contains the data that I need (may not always be in the same column in every instance):

amr_list, pssr_list, inservice_list = [], [], []
for i in range(sh.ncols):
    for j in range(sh.nrows):
        if 'amrprojectnumber' in sh.cell_value(j,i).lower():
            amr_list.append(sh.cell_value(j,i))
        if 'pssrnumber' in sh.cell_value(j,i).lower():
            pssr_list.append(sh.cell_value(j,i))
        if 'inservicedate' in sh.cell_value(j,i).lower():
            inservice_list.append(sh.cell_value(j,i))

Now I have three lists, which I need to use for writing data to a new workbook. The values in a row are related. So the index of an item in one list corresponds to the same index of the items in the other lists.

The amr_list has repeating string values. For example:

['4006BA','4006BA','4007AC','4007AC','4007AC']

The pssr_list always shares the same value as the amr_list but with additional info:

['4006BA(1)','4006BA(2)','4007AC(1)','4007AC(2)','4007AC(3)']

Finally, the inservice_list may or may not contain a variable date (as read from excel):

[40780.0, '', 40749.0, 40764.0, '']

This is the result I want from the data:

amr = { '4006BA':[('4006BA(1)',40780.0),('4006BA(2)','')], '4007AC':[('4007AC(1)',40749.0),('4007AC(2)',40764.0),('4007AC(3)','')] }

But I am having a hard time figuring out how an easy way to get there. Thanks in advance.

Was it helpful?

Solution

look into itertools.groupby and

zip(amr_list, pssr_list, inservice_list)

For your case:

dict((x,list(a[1:] for a in y)) for x,y in
    itertools.groupby(zip(amr_list, pssr_list, inservice_list), lambda z: z[0]))

Note that this assumes your input is sorted by amr_list.

Another approach would be:

combined={}
for k, v in zip(amr_list, zip(pssr_list, inservice_list)):
    combined.setdefault(k, []).append(v)

Which does not require your input to be sorted.

OTHER TIPS

Maybe this can help:

A = ['4006BA','4006BA','4007AC','4007AC','4007AC']
B = ['4006BA(1)','4006BA(2)','4007AC(1)','4007AC(2)','4007AC(3)']
C = [40780.0, '', 40749.0, 40764.0, '']

result = dict()
for item in xrange(len(A)):
    key = A[item]
    result.setdefault(key, [])
    result[key].append( (B[item], C[item] ) )

print result

This will print you the data in the format you are looking for.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top