how to get the most frequent items

https://stackoverflow.com/questions/3847079

27-09-2019
|

Question

I am working on an application which has a large array containing lines of numbers,

transNum[20000][200]//this is the 2d array containing the numbers and always keep track of the line numbers

I am using a nested loop to look for the most frequent items. which is

for(int i=0/*,lineitems=0*/;i<lineCounter;i++)
  {
      for(int j=0,shows=1;j<lineitem1[i];j++)
      {
          for(int t=i+1;t<lineCounter;t++)
          {
              for(int s=0;s<lineitem1[t];s++)
              {
                  if(transNum[i][j]==transNum[t][s])
                      shows++;
              }
          }

          if(shows/lineCounter>=0.2)
          {

              freItem[i][lineitem2[i]]=transNum[i][j];
              lineitem2[i]++;
          }
      }

  }

when I was doing tests using small input arrays like test[200][200], this loop works fine and the computing time is acceptable, but when I try to process the array contains 12000 lines, the computing time is too long, so I am thinking if there are other ways to compute the frequent items rather than using this loop.I just ran a test on 10688 lines, and the time to get all the frequent item is 825805ms, which is way to expensive.

Solution

Depends on your input. If you are also inserting the data in the same code then you can count frequent items as you insert them.

Here is a pseudo-C solution:

int counts[1000000];

while(each number as n)
{
    counts[n]++;
    // then insert number into array
}

EDIT #2: Make sure, so you don't get unexpected results, to initialize all the items in the array to zero.

OTHER TIPS

Bear in mind this is an O(n^2) algorithm at best and could be worse. That means the number of operations is proportional to the count of the items squared. After a certain number of lines, performance will degrade rapidly and there's nothing you can do about it except to improve the algorithm.

The Multiset implementation from Google Guava project might be useful in such cases. You could store items there and then retrieve set of values with count of each occurrence.

Gave the algorithm for this one some thought. Here's the solution I came up with:

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Random;

public class NumberTotalizerTest {

    public static void main(String args[]) {

        HashMap<Integer,Integer> hashMap = new HashMap<Integer,Integer>();

        // Number input
        Random randomGenerator = new Random();
        for (int i = 1; i <= 50; ++i ) {
            int randomInt = randomGenerator.nextInt(15);
            System.out.println("Generated : " + randomInt);

            Integer tempInt = hashMap.get(randomInt);

            // Counting takes place here
            hashMap.put(randomInt, tempInt==null?1:(tempInt+1) );
        }

        // Sorting and display
        Iterator itr =  sortByValue(hashMap).iterator();

        System.out.println( "Occurences from lowest to highest:" );

        while(itr.hasNext()){
            int key = (Integer) itr.next();

            System.out.println( "Number: " + key + ", occurences: " + hashMap.get(key));
        }
    }

     public static List sortByValue(final Map m) {
        List keys = new ArrayList();
        keys.addAll(m.keySet());
        Collections.sort(keys, new Comparator() {
            public int compare(Object o1, Object o2) {
                Object v1 = m.get(o1);
                Object v2 = m.get(o2);
                if (v1 == null) {
                    return (v2 == null) ? 0 : 1;
                }
                else if (v1 instanceof Comparable) {
                    return ((Comparable) v1).compareTo(v2);
                }
                else {
                    return 0;
                }
            }
        });
        return keys;
    }
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow