I stand corrected about the time needed for computing the recommendation using Collection for precomputed values. Apparently I have put the long startTime = System.nanoTime();
on the top of my code, not before List<RecommendedItem> recommendations = cachingRecommender.recommend(userID, howMany);
. This counted the time needed to load the dataset and the precomputed item-item similarities into the main memory.
However I stand behind the memory consumptions. I improved it though using custom ItemSimilarity
and loading a HashMap<Long, HashMap<Long, Double>
of the precomputed similarity. I used the trove library in order to reduce the space requirements.
Here is a detail code. The custom ItemSimilarity:
public class TextItemSimilarity implements ItemSimilarity{
private TLongObjectHashMap<TLongDoubleHashMap> correlationMatrix;
public WikiTextItemSimilarity(TLongObjectHashMap<TLongDoubleHashMap> correlationMatrix){
this.correlationMatrix = correlationMatrix;
}
@Override
public void refresh(Collection<Refreshable> alreadyRefreshed) {
}
@Override
public double itemSimilarity(long itemID1, long itemID2) throws TasteException {
TLongDoubleHashMap similarToItemId1 = correlationMatrix.get(itemID1);
if(similarToItemId1 != null && !similarToItemId1.isEmpty() && similarToItemId1.contains(itemID2)){
return similarToItemId1.get(itemID2);
}
return 0;
}
@Override
public double[] itemSimilarities(long itemID1, long[] itemID2s) throws TasteException {
double[] result = new double[itemID2s.length];
for (int i = 0; i < itemID2s.length; i++) {
result[i] = itemSimilarity(itemID1, itemID2s[i]);
}
return result;
}
@Override
public long[] allSimilarItemIDs(long itemID) throws TasteException {
return correlationMatrix.get(itemID).keys();
}
}
The total memory consumption together with my data set using Collection<GenericItemSimilarity.ItemItemSimilarity>
is 30GB, and when using TLongObjectHashMap<TLongDoubleHashMap>
and the custom TextItemSimilarity
the space requirement is 17GB.
The time performance is 0.05 sec using Collection<GenericItemSimilarity.ItemItemSimilarity>
, and 0.07 sec using TLongObjectHashMap<TLongDoubleHashMap>
. Also I believe that big role in the performance plays using CandidateItemsStrategy
and MostSimilarItemsCandidateItemsStrategy
I guess if you want to save some space use trove HashMap, and if you want just little better performance, you can use Collection<GenericItemSimilarity.ItemItemSimilarity>
.