Question

I am working on a reccommender algorithm for songs. I have a matrix of values that I get the cosine similiarity of in python ( numPy). The problem is that every time i run the program i need to recompute the similarity of every vector to every other vector. I want to store the results of computations locally so i don't have to compute it every time.

The first thing that comes to my mind is storing them in a text file, or in the database itself. Surely theres a better way though?

Was it helpful?

Solution

numpy.save is what you need :

numpy.save(file, arr)
Save an array to a binary file in NumPy .npy format.

OTHER TIPS

Take a look at https://pypi.python.org/pypi/joblib

It is made to do exactly what you want.

It is hard to answer you question. Because i don't know about ur data volume and type. But i can tell you now. If u are thinking about file for that, it may have scale out issue, if u scale out python server to # of box. So u may need a shared storage. In that case u have to think about shared storage file system like glusterFS or Hadoop. (glusterFS is more eaisier). But the access time will be very poor. The other option is u can think about Redis. It is memory based key & value store. It also supports file persistance. (because of that it's characteristics is little different from memcahed.) Final option is u can think about NoSQL which can support scalability and performance. But it is always depends on your requirement.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top