Question

I have an existing data model using openJPA and I am trying to integrate a CF system using Mahout.

Forgive me if this is a bone head question, but I just started researching mahout. Mahout in action is in the mail, so I should be up to speed soon.

My question is how to integrate mahout with an existing jpa model. Do I need to provide a CSV file to the DataModel class, or can I extend DataModel to read directly from my existing dataSource. I realize it wouldn't be very complicated to generate a CSV file from my data, but doing this seems to be an unnecessary intermediate step.

I am very new to the "large data set" world, so forgive my ignorance. But do most systems that use Mahout use a CSV data set? Somehow this seems strange to me.

Thanks.

Edit:

So I am reading the preview amazon provides on Mahout in Action. It seems that you can have mahout interface directly into your DB, but you do so at the cost of performance. I can't wait to get my hands on this book. Any comments or tips about this would still be very much appreciated.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top