Hash Table with iterators as the keys, is this poor design and can I do this better?
https://softwareengineering.stackexchange.com/questions/358932
-
21-01-2021 - |
Question
I'm developing a program where twice I've found the solution to a problem was to use hash tables with iterators as keys and some other arbitrary type as the value.
I found my self using this pattern in initially to deal with two data structures affecting one another, but not being coupled in their own right. In this first instance I had a GUI created in QT where I wanted user interaction to affect an underlying data structure. In this case there was a linked list inside of another object that needed to be edited when the user removed or added items to a GUI representation of that object (which was a list).
I don't have that much experience in GUI programming, so my solution was to send signals to the encapsulating object on the user deleting the corresponding element in the GUI with information about what gui element was deleted. internally the encapsulating object takes this information from the signal (information is an iterator) and maps that to another iterator which is then removed from the linked list (or similarly a new item is inserted). In order to avoid coupling between the gui and the processing of my program, I made a new class that was a configuration object which generates the actual core class used in processing once certain actions are taken.
This seemed like a rather disgusting solution to the problem, but I couldn't figure out how to make it better, gui action appears to need direct correlation with non gui action.
Recently I found myself using a similar pattern again, this time I was trying to implement the following:
I have two lists of items (old and new) which I can apply a distance metric to. I want to associate items from one list with items in the second. I do not want to use the Hungarian algorithm because of the code complexity and runtime of the algorithm.
I was planning on implementing the association via finding the item with the smallest distance to the new element, comparing that against previous smallest value associated with that element, then replacing the associated value until every item has been iterated through in new.
The issue is that, if I don't want to just tag a new unnecessary values to these items (ie, last_smallest_distance
and current_closest_associated_item
) which have no value past the association step. I'm going to have to find some way to add persistence between iterations used to find association. The most obvious solution to me is yet again to make an iterator hash table where each iterator is associated with a separate closest distance and current closest associated value pair.
Here is an example of what I'm talking about:
struct ItemDistancePair{
Item & item;
double distance;
}
void updateOldItems(std::list<Item>& old_items, const std::list<Item>& new_items){
std::unordered_map<std::list<Item>::iterator, ItemDistancePair) old_new_value_map;
for(auto& new_item: new_items){
std::list<Item>::iterator closest_old_itr = findClosestAssociation(new_item, old_items)
double distance = new_item.distanceFrom(*closest_old_itr);
auto map_location_itr = old_new_value_map.find(closest_old_itr);
if(map_location_itr == old_new_value_map.end()){
old_new_value_map.insert(std::make_pair(closest_old_itr, ItemDistancePair(new_item, distance)));
}
else if(map_location_itr->second.distance > distance){
map_location_itr->second = ItemDistancePair(new_item, distance);
}
}
for(auto& key_value_pair : old_new_value_map){
key_value_pair.first->updateItem(key_value_pair->second.item);
}
}
Note that in the case that multiple new items are closer to one old item, its possible another old item never gets updated, this is fine
As you can see this really isn't ideal (though maybe this wouldn't be a problem in a language other than C++?) causes lots of odd code (forces me to have to make a struct if I don't want to use std::pair).
Is this acceptable or is there some way around this, or is the real best solution to the problem just to include this distance information in the Item
class itself, what I was trying to avoid in the first place?
Solution
Algorithm
So, AFAICT your algorithm boils down to:
- For each
new_item
, find the closestold_item
. (This creates a set ofold_item
s.) - For each
old_item
in the set, update it with the closestnew_item
.
Bug alert
Imagine a line with 4 points: A_new
, 3 units distance, B_old
, 2 units distance, C_new
, 1 unit distance, D_old
.
For this case, a decision is needed:
- update
B_old
withA_new
(C_new
has better matchD_old
, so take the next best match) - update
B_old
withC_new
(C_new
is closest fromB_old
, so take the best match) - don't update
B_old
(A_new
invalid becauseC_new
is closer toB_old
, butC_new
is invalid as well becauseD_old
is closer toC_new
thanB_old
) - update
B_old
withC_new
(just update every old item with the closest new item, different problem, but easier solution)
Your example would choose option 1 in this case, but not necessarily in more complex ones. This might not be intended.
My current algorithm pseudocode above would choose option 2. This might also not be intended.
I'll continue under the assumption option 3 is what's really intended. (If this is wrong, leave me a comment!)
Algorithm (fixed)
The algorithm is then:
- For each
new_item
, find the closestold_item
. (This creates a set ofold_item
s.) - For each
old_item
in the set:- find the closest
new_item
. - find the closest
old_item
to the foundnew_item
. - if the found closest
old_item
is the same as the currentold_item
, update it with the foundnew_item
.
- find the closest
Implementation
As you correctly deduced, a kind of reference to the old_item
needs to be passed from step 1 to step 2. This reference needs to be stored and to be compared ("are these two items at the same address?").
Pure C++ references don't work well for this kind of reference, as there's no easy way to store and compare them. Pointers however fulfill both requirements, and are more direct than iterators (though technically speaking, iterators can work as well).
void updateOldItems(std::list<Item>& old_items, std::list<Item>& new_items) {
std::unordered_set<Item*> old_items_to_update;
// STEP 1
for(auto& new_item : new_items) {
auto old_item = &(*findClosestAssociation(new_item, old_items));
old_items_to_update.insert(old_item); // takes care of duplicates
}
// STEP 2
for(auto old_item : old_items_to_update) {
auto new_item = findClosestAssociation(*old_item, new_items);
auto closest_old_item = &(*findClosestAssociation(*new_item, old_items));
if(closest_old_item == old_item) old_item->updateItem(*new_item);
}
}
The conversion from std::list<Item>::iterator
to Item*
is a bit cumbersome (dereference iterator to get reference, take address from reference), but that's more a problem of findClosestAssociation
returning an iterator in the first place (instead of, say, a reference or a pointer). Of course, you could also just simply make old_items_to_update
a set of iterators, but this might cause other problems in the future (e.g. exchanging std::list
for std::forward_list
or std::vector
requires updating all iterator types).