Hash Table with iterators as the keys, is this poor design and can I do this better?

https://softwareengineering.stackexchange.com/questions/358932

21-01-2021
|

Question

I'm developing a program where twice I've found the solution to a problem was to use hash tables with iterators as keys and some other arbitrary type as the value.

I found my self using this pattern in initially to deal with two data structures affecting one another, but not being coupled in their own right. In this first instance I had a GUI created in QT where I wanted user interaction to affect an underlying data structure. In this case there was a linked list inside of another object that needed to be edited when the user removed or added items to a GUI representation of that object (which was a list).

I don't have that much experience in GUI programming, so my solution was to send signals to the encapsulating object on the user deleting the corresponding element in the GUI with information about what gui element was deleted. internally the encapsulating object takes this information from the signal (information is an iterator) and maps that to another iterator which is then removed from the linked list (or similarly a new item is inserted). In order to avoid coupling between the gui and the processing of my program, I made a new class that was a configuration object which generates the actual core class used in processing once certain actions are taken.

This seemed like a rather disgusting solution to the problem, but I couldn't figure out how to make it better, gui action appears to need direct correlation with non gui action.

Recently I found myself using a similar pattern again, this time I was trying to implement the following:

I have two lists of items (old and new) which I can apply a distance metric to. I want to associate items from one list with items in the second. I do not want to use the Hungarian algorithm because of the code complexity and runtime of the algorithm.

I was planning on implementing the association via finding the item with the smallest distance to the new element, comparing that against previous smallest value associated with that element, then replacing the associated value until every item has been iterated through in new.

The issue is that, if I don't want to just tag a new unnecessary values to these items (ie, last_smallest_distance and current_closest_associated_item) which have no value past the association step. I'm going to have to find some way to add persistence between iterations used to find association. The most obvious solution to me is yet again to make an iterator hash table where each iterator is associated with a separate closest distance and current closest associated value pair.

Here is an example of what I'm talking about:

struct ItemDistancePair{
    Item & item;
    double distance;
}

void updateOldItems(std::list<Item>& old_items, const std::list<Item>& new_items){
    std::unordered_map<std::list<Item>::iterator, ItemDistancePair) old_new_value_map;
    for(auto& new_item: new_items){
        std::list<Item>::iterator closest_old_itr = findClosestAssociation(new_item, old_items)
        double distance = new_item.distanceFrom(*closest_old_itr);
        auto map_location_itr = old_new_value_map.find(closest_old_itr);
        if(map_location_itr == old_new_value_map.end()){
            old_new_value_map.insert(std::make_pair(closest_old_itr, ItemDistancePair(new_item, distance)));

        }
        else if(map_location_itr->second.distance > distance){ 
            map_location_itr->second = ItemDistancePair(new_item, distance);
        }
    }
    for(auto& key_value_pair : old_new_value_map){
        key_value_pair.first->updateItem(key_value_pair->second.item);
    }
}

Note that in the case that multiple new items are closer to one old item, its possible another old item never gets updated, this is fine

As you can see this really isn't ideal (though maybe this wouldn't be a problem in a language other than C++?) causes lots of odd code (forces me to have to make a struct if I don't want to use std::pair).

Is this acceptable or is there some way around this, or is the real best solution to the problem just to include this distance information in the Item class itself, what I was trying to avoid in the first place?

Solution

Algorithm

So, AFAICT your algorithm boils down to:

For each new_item, find the closest old_item. (This creates a set of old_items.)
For each old_item in the set, update it with the closest new_item.

Bug alert

Imagine a line with 4 points: A_new, 3 units distance, B_old, 2 units distance, C_new, 1 unit distance, D_old.

For this case, a decision is needed:

update B_old with A_new (C_new has better match D_old, so take the next best match)
update B_old with C_new (C_new is closest from B_old, so take the best match)
don't update B_old (A_new invalid because C_new is closer to B_old, but C_new is invalid as well because D_old is closer to C_new than B_old)
update B_old with C_new (just update every old item with the closest new item, different problem, but easier solution)

Your example would choose option 1 in this case, but not necessarily in more complex ones. This might not be intended.

My current algorithm pseudocode above would choose option 2. This might also not be intended.

I'll continue under the assumption option 3 is what's really intended. (If this is wrong, leave me a comment!)

Algorithm (fixed)

The algorithm is then:

For each new_item, find the closest old_item. (This creates a set of old_items.)
For each old_item in the set:
1. find the closest new_item.
2. find the closest old_item to the found new_item.
3. if the found closest old_item is the same as the current old_item, update it with the found new_item.

Implementation

As you correctly deduced, a kind of reference to the old_item needs to be passed from step 1 to step 2. This reference needs to be stored and to be compared ("are these two items at the same address?").

Pure C++ references don't work well for this kind of reference, as there's no easy way to store and compare them. Pointers however fulfill both requirements, and are more direct than iterators (though technically speaking, iterators can work as well).

void updateOldItems(std::list<Item>& old_items, std::list<Item>& new_items) {
    std::unordered_set<Item*> old_items_to_update;

    // STEP 1
    for(auto& new_item : new_items) {
        auto old_item = &(*findClosestAssociation(new_item, old_items));
        old_items_to_update.insert(old_item); // takes care of duplicates
    }

    // STEP 2
    for(auto old_item : old_items_to_update) {
        auto new_item = findClosestAssociation(*old_item, new_items);
        auto closest_old_item = &(*findClosestAssociation(*new_item, old_items));

        if(closest_old_item == old_item) old_item->updateItem(*new_item);
    }
}

The conversion from std::list<Item>::iterator to Item* is a bit cumbersome (dereference iterator to get reference, take address from reference), but that's more a problem of findClosestAssociation returning an iterator in the first place (instead of, say, a reference or a pointer). Of course, you could also just simply make old_items_to_update a set of iterators, but this might cause other problems in the future (e.g. exchanging std::list for std::forward_list or std::vector requires updating all iterator types).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange