Question

I need to find the most effecient way to find a random element from the most popular category

From

4 Cheese
1 Olive
2 Mushroom
4 Ham
2 Chicken
4 Salad

I want either Cheese or Ham or Salad. If there are multiple top categories I don't care which category I will get my item from.

On the input I have Iterator<Foo> where Foo implements

Interface Foo {
    int getCategory();
}

My current code looks like this:

Foo getItem(Iterator<Foo> it) {
    Map<Integer, List<Foo>> categoryMap = new HashMap<Integer, List<Foo>>();
    while(it.hasNext()) {
        Foo foo = it.next();
        int category = foo.getCategory();

        List<Foo> l = categoryMap.get(category);
        if(l == null) {
            l = new ArrayList<Foo>();
            categoryMap.put(category, l);
        }

        l.add(foo);
    }

    int longest_list_size = 0;
    int longest_category_id = -1;

    Set<Integer> categories = categoryMap.keySet()

    for(Integer c:  categories ) {
        int list_size = categoryMap.get(c).size();
        if(list_size  > longest_list_size) {
           longest_list_size = list_size;
           longest_category_id = c;
        }
    }

    if(longest_list_size == 0)
        return null;

    int r = new Random().nextInt(longest_list_size);
    return categoryMap.get(c).get(r);
}
Was it helpful?

Solution

Probably faster to have 2 maps:

Foo getItem(Iterator<Foo> it) {
    Map<Integer, Foo> categoryToFoo = new HashMap<Integer, Foo>();
    Map<Integer, Integer> counts = new HashMap<Integer, Integer>();
    int maxCount = 0;
    while(it.hasNext()) {
        Foo foo = it.next();
        int category = foo.getCategory();
        int categoryCount = 1;
        if ( ! categoryToFoo.contains( category ) ) {
            categoryToFoo.put( category, foo );
        }
        else {
            categoryCount = counts.get( category ) + 1;
        }
        counts.put( category, categoryCount );
        if ( categoryCount > maxCount ) {
            maxCount = categoryCount;
        }
    }

    List<Foo> possible = new ArrayList<Foo>()
    for ( Map.Entry entry : counts.entrySet() ) {
        if ( entry.getValue() == maxCount ) {
            possible.add( categoryToFoo.get( entry.getKey() ) );
        }
    }

    return possible.get( new Random().nextInt( possible.size() ) );
}

You could do further optimization in many places, but you get the idea.

OTHER TIPS

Here is what I would do:

  1. create a List<Foo> from it
  2. sort the list by category
  3. go through the list from the beginning and store the start and the end index of the longest interval with the same category
  4. pick a random element between the start and the end index

I think it is a bit faster with less code but your solution is fine too.

If you are really concerned about performance because it could have million elements then you should not work with this Iterator at the first place. In this case you should maybe store the popularity of each category in one Map, and store the list of the same items in another Map, but I don't know anything the rest of the code.

Well it's honestly hard (if not impossible) to improve on your method, at least complexity wise. Let's analyze it. You are doing

  1. Insertion into a map -> O(N)
  2. Computation of maximum -> O(N)

Total: O(N)

Other methods:

  1. Priority Queue -> O(N*log(N)) insertion of all elements + O(1) retrieval of head
  2. Sorting the initial map by key O(N*log(N)) + O(1) retrieval of first
  3. If you know the interval of the vote count, say [0..K] and it's less or not much higher than N you could do a counting sort in O(K) + O(1) to take the maximum.

If you only need to the maximum retrieval once, then your method is good enough, IMO.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top