Question

I have

Multiset<String> keys

I want to have a Multiset with the 200 most frequent keys. I've figured out how to get an ImmutableMultiset that is ordered by frequency but I'm having trouble getting just a subset of it.

I got the sorting aspect from this other stackOverFlow question

ImmutableMultiset<String> orderedMultiset = Multisets.copyHighestCountFirst(keys);

For the subset aspect I tried using a MinMaxPriorityQueue so that I can use the maximumSize(200).

MinMaxPriorityQueue<String> orderedSubset = MinMaxPriorityQueue.maximumSize(200).create(orderedMultiset);

But it seems to return just a random selection of 200 of the keys. And checking the top value from orderedMultiset doesn't even appear in the returned MinMaxPriorityQueue. I also am afraid that even if it did get them in order I might end up with only a couple different keys where the sum of the counts total 200. And preferably I'd like to have 200 different keys with the count for each.

I asked someone and they mentioned something about a POJO and a Comparable but I didn't follow what they suggested. Since a POJO can't really implement a Comparable by definition. I'm not really sure.

I also was playing around with using the guava Ordering but I don't think that will work since the function would be taking Strings and is unaware of the counts from the multiset.

MinMaxPriorityQueue<String> strings = MinMaxPriorityQueue.orderedBy(topKCount).maximumSize(200).create(multisets);


    private final Ordering<String> topKCount = Ordering.natural()
        .onResultOf(new Function<String, String>() {
            @Override
            public String apply(String keys) {
                //todo
            }
        });

Does anyone know what I'm doing wrong here or at least point in the right direction. Thanks

Was it helpful?

Solution

If you just want to get the 200 most frequent keys, you can just do Multisets.copyHighestCountFirst(multiset).elementSet().asList().subList(0, 200). If you like, you can then use that to populate another ImmutableMultiset with those elements and their corresponding counts from the original multiset.

OTHER TIPS

Your MinMaxPriorityQueue solution does not work because the Iterable view of a Multiset contains only the elements, not their counts. The selection appears random because the elements are being sorted on the elements themselves, not their frequency count.

You almost have it with your Ordering solution - just use the original Multiset to do the comparison:

final Multiset<String> multiset = ...;
Ordering<String> ordering = Ordering.from(new Comparator<String>() {
  @Override public int compare(String s1, String s2) {
    return Ints.compare(multiset.count(s2), multiset.count(s1));
  }
});

try Using :

ImmutableSortedSet.Builder(Ordering.explicit(${List})) ...... ;

the ImmuetablSortedSet is Ordering by the object's index of List

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top