Question

I've asked a similar question (Remove duplicate value from dictionary without removing key) but I think I'm getting deeper into the problem and can explain it better because, unfortunately, none of the answers did what I needed but they did answer my question.

I'm building an OrderedDict from two .csv files. The first one contains location codes, while the second one is a list of hardware relocations by time. All location codes are unique so that is the key to my dictionary, I've got a loop that builds the dictionary with empty values using that .csv file. Then I have another loop that adds the hardware data to the correct location code. Some of the hardware data is in a list form so it isn't hashable.

The problem I'm having is that as the hardware moves to a new location, I need it to be removed from its previous location. so its only in one place at the end of the of the code

My location codes are;

>1, 2, 3, 4, 5, 6

My hardware data is in order by time,

>7pm, 1, 'item1', 'item2', 'item3'
>8pm, 2, 'item4', 'item5', 'item6'  
>9pm, 3, 'item7', '', ''
>10pm, 4, 'item8', '', ''
>11pm, 5, 'item1', 'item2', 'item3'
>12am, 6, 'item7', '', ''
>1am, 3, 'item4', 'item5', 'item6'

If I run the code for the entire timeframe without any conditional statements my final dictionary looks like

>myDict = {'1': ['item1', 'item2', 'item3'], '2': ['item4', 'item5', 'item6'],  
>'3': 'item7', '4': 'item8', '5': ['item1', 'item2', 'item3'], '6': 'item7'}

But what I need it too look like is;

>my Dict = {'1': '', '2':'', '3': ['item4', 'item5', 'item6'], '4':  
>'item8', '5': ['item1', 'item2', 'item3'], '6': 'item7'}

Because the items (values) are not added to the dictionary in the same order that the locations (keys) are added, its important that I do this while building the dictionary (adding the values) because I can't go back through and just remove the duplicates after its completed.

I've tried many things and have gotten different results but my latest is

locationCSV =  open('location.csv', "r")
hardwareCSV =  open('hardware.csv', "r")
locationLines = locationCSV.readlines()
hardwareLines = hardwareCSV.readlines()
finalLoc = OrderedDict() 

for line in locationLines:
    locationList = line.split(",")
    code = locationList[0]
    finalLoc[code] = ""

for line in hardwareLines:
    hardwareList = line.split(",")
    hardwareData = [hardwareList[2],hardwareList[3],hardwareList[4]]
    for k, v in finalLoc.iteritems():
        if hardwareData in finalLoc.itervalues():
            finalLoc[k] = ""
    finalLoc[hardwareList[1]] = hardwareData

print finalLoc

This returns all the locations empty. I've been stuck on this for a few days so any help would be appreciated.

Was it helpful?

Solution

There are a number of problems with your code that prevent you from even getting that far, so this can't possibly be your real code. But let's go through the errors:


csvList = line.split(",")

This is going to give you values like " 1" and " 'item1'", which I can't imagine is what you actually want.

In fact, the fact that your lines have stray whitespace at the end means they won't even match up. For example, the last string in the second line is " 'item6' ", but in the last line it's " 'item6'", which aren't the same string.

This would be much easier if you used the csv library instead of trying to do it yourself. If you just want a quick hack to solve the problem, you can strip each entry:

csvList = [item.strip() for item in line.split(",")]

hardwareData = [csvList[2],csvList[3],csvList[4]]

Since some of your lines only have 3 columns, this is going to raise an IndexError. If you just want to just get fewer than 3 values for short rows instead of raising, you can do:

hardwareData = csvList[2:5]

for k, v in finalLoc.iteritems():
    if hardwareData in finalLoc.itervalues():

For each line, you're going through the entire dictionary, and for each entry, searching the entire dictionary to see if finalLoc is a value anywhere. So, if there are already 10 items in the dict, you're going to find each line that already exists 100 times. Which means that if you blank what you find, for each line, you're going to blank every line 10 times.

You probably wanted if hardwareData == v here.


        finalLoc[key] = ""

You haven't defined key anywhere in the code you've shown us. If you defined it somewhere earlier, it's going to blank out the same value each of the 100 times for each line. Otherwise, this will just raise a NameError.

You probably wanted finalLoc[k] here.

This whole part would be a lot simpler (and more efficient) if you kept an inverse dictionary, mapping each value to its key.


Anyway, putting together all those fixes, your code works:

from collections import OrderedDict

hardwareLines = """7pm, 1, 'item1', 'item2', 'item3'
8pm, 2, 'item4', 'item5', 'item6'  
9pm, 3, 'item7'
10pm, 4, 'item8'
11pm, 5, 'item1', 'item2', 'item3'
12am, 6, 'item9'
1am, 3, 'item4', 'item5', 'item6'""".splitlines()

finalLoc = OrderedDict() 

for line in hardwareLines: ##hardware is the second .csv
    csvList = [item.strip() for item in line.split(",")]
    hardwareData = csvList[2:5]
    for k, v in finalLoc.iteritems():
        if hardwareData == v:
            finalLoc[k] = ""
    finalLoc[csvList[1]] = hardwareData

for k, v in finalLoc.iteritems():
    print('{}: {}'.format(k, v))

The output is:

1: 
2: 
3: ["'item4'", "'item5'", "'item6'"]
4: ["'item8'"]
5: ["'item1'", "'item2'", "'item3'"]
6: ["'item9'"]

Here's a version using the csv module and an inverse mapping:

from collections import OrderedDict
import csv

hardwareLines = """7pm, 1, 'item1', 'item2', 'item3'
8pm, 2, 'item4', 'item5', 'item6'  
9pm, 3, 'item7'
10pm, 4, 'item8'
11pm, 5, 'item1', 'item2', 'item3'
12am, 6, 'item9'
1am, 3, 'item4', 'item5', 'item6'""".splitlines()

finalLoc = OrderedDict()

invmap = {}

for row in csv.reader(map(str.rstrip, hardwareLines), 
                      skipinitialspace=True, quotechar="'"):
    hardwareData = tuple(row[2:5])
    if hardwareData in invmap:
        finalLoc[invmap[hardwareData]] = ""
    finalLoc[row[1]] = list(hardwareData)
    invmap[hardwareData] = row[1]

for k, v in finalLoc.iteritems():
    print('{}: {}'.format(k, v))

I still needed to explicitly strip the excess trailing whitespace on each line, but otherwise, csv took care of everything for me—and notice that it also removes the excess quotes around each value.

Meanwhile, instead of having to figure out how to walk through the items and find every key that matches the current value, the invmap lets me just look it up in one step. (Notice that the mapping have to be 1-to-1, because if you'd already encountered the value twice, the first one would already have been removed.)


Of course even fixing the stripping and quoting problems, the results still aren't exactly what you want. Your desired output apparently unwraps single-element lists into just the element. If you want that, you'll need to do that explicitly. But you probably don't want that. In fact, you probably want to use [] instead of '' as your "empty" value. That way, you know the value is always a list of 0 or more items, instead of having to treat an empty string as 0 values, any other string as 1 value, and a list as multiple values. So, when you process it, instead of writing code like this:

if value == '':
    return ''
elif isinstance(value, str):
    return process_one_value(value)
else:
    return map(process_one_value, value)

… you can just do this:

return map(process_one_value, value)

OTHER TIPS

My quick and dirty version, with a focus on maintaining the order dictionary. There is a csv module to take care of reading and parsing input data.

def removeItem(d, item):
    # remove item from d, if present
    for k,v in d.items():
        if item in v:
            v.remove(item)
            d[k] = v

d=OrderedDict()
for c in loc_codes: #['1','2',....]
    d[c]=[]
for line in hardware.split('\n'): # or read line from file
    if line:
        items = line.split(', ')
        items = [l.strip("'") for l in items]
        k = items[1].strip()
        v = items[2:]
        for item in v:
            removeItem(d,item)
            d[k] += [item]
print d

result:

OrderedDict([('1', []), ('2', []), ('3', ['item7', 'item4', 'item5', 'item6']), ('4', ['item8']), ('5', ['item1', 'item2', 'item3']), ('6', ['item9'])])

I start with [] values so it is easy to add and remove items from the list. You could easily change the [] values to '' if that is important. For a large set of data, removeItem isn't as efficient as something using sets or other dictionaries, but it is a quick and obvious way of getting going. Plus it does not depend on any other data structures.

I'm not going to worry about the parsing aspect of this. So let's assume you've got the data loaded into a useable format such as:

locations = [1, 2, 3, 4, 5, 6]
hardware = [
    ('7pm',  1, ['item1', 'item2', 'item3']),
    ('8pm',  2, ['item4', 'item5', 'item6']),
    ('9pm',  3, ['item7']),
    ('10pm', 4, ['item8']),
    ('11pm', 5, ['item1', 'item2', 'item3']),
    ('12am', 6, ['item9']),
    ('1am',  3, ['item4', 'item5', 'item6'])
]

(I strongly encourage you to separate your parsing code from your data processing code. It's a lot easier to reason through your algorithm if it's not intermingled with CSV parsing code.)

The key to solving this is maintaining two dicts as you process the data, one mapping locations to lists of items at those locations and one mapping items to their locations. The second map is an inverse of the first.

Having these inverse maps will let us look up information in either direction. If we have a location we can see what items are there, and if we have an item we can get its location.

items_by_location = dict()   # The items in each location.
locations_by_item = dict()   # The location of each item.

# Start with an empty set for the list of items in each location.
for location in locations:
    items_by_location[location] = set()

# Iterate over each item in each hardware line one by one.
for time, location, items in hardware:
    for item in items:
        old_location = locations_by_item.get(item)
        new_location = location

        # Remove the item from its old location.
        if old_location:
            items_by_location[old_location].remove(item)

        # Add it to its new location.
        items_by_location[new_location].add(item)
        locations_by_item[item] = new_location

# Now we can iterate over the list and see where each item ended up.    
for location, items in items_by_location.items():
    print location, items

Output:

1 set([])
2 set([])
3 set(['item6', 'item7', 'item4', 'item5'])
4 set(['item8'])
5 set(['item2', 'item3', 'item1'])
6 set(['item9'])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top