Question

Please advise on a Java Bayesian Inference framework that:

1. Is open-source
2. Can be used programmatically from Java app.
3. Will be able to process 10 GB data-set running on a single host (node) 
4. NOT Mahout or any other Hadoop-based / distributed frameworks (see 3.)
Était-ce utile?

La solution

The size of your data isn't going to be the limiting factor, it's the complexity of the model you will be updating. If it's a simple naive Bayes model then that's pretty easy to implement. If you want something more sophisticated e.g. multi-connected network then the model complexity will determine if you can do exact inference or if it will require trade-offs e.g. approximate algorithms.

Kevin Murphy has a recently updated comparison of software for Bayesian inference at http://www.cs.ubc.ca/~murphyk/Software/bnsoft.html. One open source package I'm exploring is libDAI (http://cs.ru.nl/~jorism/libDAI/) which is in C++ but I assume it's callable from Java. It supports multiple inference methods, including loopy belief propagation which seems to be a pretty good approximation algorithm.

Autres conseils

Maybe weka fits your bill? http://www.cs.waikato.ac.nz/ml/weka/ It definitely fulfills requirements 1, 2 and 4. 3 should be doable with something like a custom implementation of weka.core.Instances if the default one does not provide some sort of "streaming" of the data so not all of it needs to reside in memory at once - haven't used it in a while so I don't know for sure.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top