Question

Learning python lately, I have a program which takes a list of dicts, (each dict in the list has keys repeated from the other dicts) .

it then passes it to a function whose job it is to aggregate the data in the values and return a single dict, however when I go to access the original dict again after the function call, it has been updated with the values from the single dict, I cannot see any part of the code in my method which is doing it, and have been stuck on it a few hours.

Here's my code:

#!/usr/bin/env python
import ast

def process_visitor_stats_list(original_list): 
    temp_original_list = original_list[:]  # attempt to copy original list so it doesnt get changed

    new_dict = {}   # this will store each unique key in dict along with the sum of its values
    for line in temp_original_list:  
        for key in line:
            if(key not in new_dict):  # checks if key is in new_dict, adds it if not and adds a value which tracks how often the key occurs
                new_dict[key] = line[key]
                new_dict[key].append(1)    # it also adds another number to the value, which stores the amount of times it was in the original list of dicts

            else:
                new_dict[key][0] += float(line[key][0])  # if key is already in dict, it sums its values 
                new_dict[key][1] += float(line[key][1])   
                new_dict[key][2] += 1 

    return new_dict


if __name__ == "__main__":

    original_list_of_dicts = []  # this will store my list of dicts

    line1 = "{'entry1': [4.0, 2.0], 'entry2': [592.0, 40.0], 'entry3': [5247044.0, 1093776.0], 'entry4': [1235.0, 82.0]}"
    line2 = "{'entry1': [26260.0, 8262.0], 'entry2': [2.0, 0.0], 'entry3': [1207.0, 142.0], 'entry4': [382992.0, 67362.0]}"
    line3 = "{'entry1': [57486.0, 16199.0], 'entry2': [6.0, 3.0], 'entry3': [280.0, 16.0]}"

    original_list_of_dicts.append(ast.literal_eval(line1))  # adds each line to the list and casts them as dicts
    original_list_of_dicts.append(ast.literal_eval(line2))
    original_list_of_dicts.append(ast.literal_eval(line3))

    print "original list of dicts before method call"
    for line in original_list_of_dicts:    # prints out each entry in the list of dicts
        for key in line:
            print key + str(line[key])

    print '\n'
    new_dict = process_visitor_stats_list(original_list_of_dicts)    # calls the method to process the original list of dicts
    print '\n'                                                      # this should return a single dict list with aggregate data


    print "original list of dicts after method call"
    for line in original_list_of_dicts:   # however when i go to access the original dict, its values have been changed
        for key in line:
            print key + str(line[key])
Was it helpful?

Solution

When you copy the list:

temp_original_list = original_list[:]

you only do a shallow copy, i.e. the new list contains references to the same objects that were in the original list. As the objects within the list are mutable dictionaries, you will need to do a deep copy of the list:

import copy

temp_original_list = copy.deepcopy(original_list)

This will recursively copy the objects within containers and create new versions of those, too.

From the documentation:

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

Strictly speaking, your problem isn't related as much to the dictionaries as to the lists they hold in turn (e.g. original_list[0]['entry1']). On this line:

new_dict[key] = line[key]

you are referencing the same list object in new_dict as was in original_list. Therefore when you mutate it, e.g.:

new_dict[key].append(1)

this change also appears in the original dictionary. You could therefore also have solved this by making the inner list a copy (only shallow copy required here, as it contains immutable values rather than mutable containers):

new_dict[key] = line[key][:]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top