문제

I am working on a reccommender algorithm for songs. I have a matrix of values that I get the cosine similiarity of in python ( numPy). The problem is that every time i run the program i need to recompute the similarity of every vector to every other vector. I want to store the results of computations locally so i don't have to compute it every time.

The first thing that comes to my mind is storing them in a text file, or in the database itself. Surely theres a better way though?

도움이 되었습니까?

해결책

numpy.save is what you need :

numpy.save(file, arr)
Save an array to a binary file in NumPy .npy format.

다른 팁

Take a look at https://pypi.python.org/pypi/joblib

It is made to do exactly what you want.

It is hard to answer you question. Because i don't know about ur data volume and type. But i can tell you now. If u are thinking about file for that, it may have scale out issue, if u scale out python server to # of box. So u may need a shared storage. In that case u have to think about shared storage file system like glusterFS or Hadoop. (glusterFS is more eaisier). But the access time will be very poor. The other option is u can think about Redis. It is memory based key & value store. It also supports file persistance. (because of that it's characteristics is little different from memcahed.) Final option is u can think about NoSQL which can support scalability and performance. But it is always depends on your requirement.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top