Java: Random element from the most popular category
-
04-03-2021 - |
Question
I need to find the most effecient way to find a random element from the most popular category
From
4 Cheese
1 Olive
2 Mushroom
4 Ham
2 Chicken
4 Salad
I want either Cheese
or Ham
or Salad
. If there are multiple top categories I don't care which category I will get my item from.
On the input I have Iterator<Foo>
where Foo
implements
Interface Foo {
int getCategory();
}
My current code looks like this:
Foo getItem(Iterator<Foo> it) {
Map<Integer, List<Foo>> categoryMap = new HashMap<Integer, List<Foo>>();
while(it.hasNext()) {
Foo foo = it.next();
int category = foo.getCategory();
List<Foo> l = categoryMap.get(category);
if(l == null) {
l = new ArrayList<Foo>();
categoryMap.put(category, l);
}
l.add(foo);
}
int longest_list_size = 0;
int longest_category_id = -1;
Set<Integer> categories = categoryMap.keySet()
for(Integer c: categories ) {
int list_size = categoryMap.get(c).size();
if(list_size > longest_list_size) {
longest_list_size = list_size;
longest_category_id = c;
}
}
if(longest_list_size == 0)
return null;
int r = new Random().nextInt(longest_list_size);
return categoryMap.get(c).get(r);
}
La solution
Probably faster to have 2 maps:
Foo getItem(Iterator<Foo> it) {
Map<Integer, Foo> categoryToFoo = new HashMap<Integer, Foo>();
Map<Integer, Integer> counts = new HashMap<Integer, Integer>();
int maxCount = 0;
while(it.hasNext()) {
Foo foo = it.next();
int category = foo.getCategory();
int categoryCount = 1;
if ( ! categoryToFoo.contains( category ) ) {
categoryToFoo.put( category, foo );
}
else {
categoryCount = counts.get( category ) + 1;
}
counts.put( category, categoryCount );
if ( categoryCount > maxCount ) {
maxCount = categoryCount;
}
}
List<Foo> possible = new ArrayList<Foo>()
for ( Map.Entry entry : counts.entrySet() ) {
if ( entry.getValue() == maxCount ) {
possible.add( categoryToFoo.get( entry.getKey() ) );
}
}
return possible.get( new Random().nextInt( possible.size() ) );
}
You could do further optimization in many places, but you get the idea.
Autres conseils
Here is what I would do:
- create a
List<Foo>
fromit
- sort the list by category
- go through the list from the beginning and store the start and the end index of the longest interval with the same category
- pick a random element between the start and the end index
I think it is a bit faster with less code but your solution is fine too.
If you are really concerned about performance because it
could have million elements then you should not work with this Iterator
at the first place. In this case you should maybe store the popularity of each category in one Map
, and store the list of the same items in another Map
, but I don't know anything the rest of the code.
Well it's honestly hard (if not impossible) to improve on your method, at least complexity wise. Let's analyze it. You are doing
- Insertion into a map -> O(N)
- Computation of maximum -> O(N)
Total: O(N)
Other methods:
- Priority Queue -> O(N*log(N)) insertion of all elements + O(1) retrieval of head
- Sorting the initial map by key O(N*log(N)) + O(1) retrieval of first
- If you know the interval of the vote count, say [0..K] and it's less or not much higher than N you could do a counting sort in O(K) + O(1) to take the maximum.
If you only need to the maximum retrieval once, then your method is good enough, IMO.