Question

Please advise on a Java Bayesian Inference framework that:

1. Is open-source
2. Can be used programmatically from Java app.
3. Will be able to process 10 GB data-set running on a single host (node) 
4. NOT Mahout or any other Hadoop-based / distributed frameworks (see 3.)
Was it helpful?

Solution

The size of your data isn't going to be the limiting factor, it's the complexity of the model you will be updating. If it's a simple naive Bayes model then that's pretty easy to implement. If you want something more sophisticated e.g. multi-connected network then the model complexity will determine if you can do exact inference or if it will require trade-offs e.g. approximate algorithms.

Kevin Murphy has a recently updated comparison of software for Bayesian inference at http://www.cs.ubc.ca/~murphyk/Software/bnsoft.html. One open source package I'm exploring is libDAI (http://cs.ru.nl/~jorism/libDAI/) which is in C++ but I assume it's callable from Java. It supports multiple inference methods, including loopy belief propagation which seems to be a pretty good approximation algorithm.

OTHER TIPS

Maybe weka fits your bill? http://www.cs.waikato.ac.nz/ml/weka/ It definitely fulfills requirements 1, 2 and 4. 3 should be doable with something like a custom implementation of weka.core.Instances if the default one does not provide some sort of "streaming" of the data so not all of it needs to reside in memory at once - haven't used it in a while so I don't know for sure.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top