Question

I have an extremely big dictionary that I need to analyze.

How the dictionary come to existance?

The dictionary is a pivot table of log file. I have a snapshot of inventory everyday and right now I have the snapshots for the past month.

each snapshot looks like this:

2013-01-01 Apple 1000
2013-01-01 Banana 2000
2013-01-01 Orange 3000
....

And then, I group all the records by the product name and plan to do the time series analysis later. The output I have looks like this:

{
 Apple:[(2013-01-01,1000),(2013-01-02, 998),(2013-01-03,950)...],
 Banana:[(2013-01-01,2000),(2013-01-02, 1852),(2013-01-03, 1232)...]
 Orange....
}

As you know, assuming you have years and years of inventory snapshots and very wide inventory breadth... This dictionary turns out to be huge. The whole 'GROUPING' process happens in memory and the size of the dictionary exceeds the memory limit.

I am wondering how to restrict the memory usage to a specific amount(say 5GB and I don't want to disable the server for normal usage) and do the work on the disk.

Here is a very similar question to mine but following the 'BEST VOTED' answer, the memory is still quickly eaten up after I change the loop number to a real 'Big data' size.

So any example that truly doesn't kill memory would be appreciated and speed is not that import to me.

(Note, there are several ways to optimize the data structure so that the dictionary size could be reduced but... the inventory snapshots are not periodic and some of the products have different number of snapshots so 'MATRIX' idea may not work)

Était-ce utile?

La solution

At this point, I would suggest you stop using a dictionary and import sqlite3, or you're going to be reinventing the wheel implementing optimizations that databases already have.

To get started quickly, Elixir is a very decent and practical ORM.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top