How best to hold 1000 different data series using TimeSeries module in Python?
Question
I want to create a massive TimeSeries object which will hold 1000 different financial markets data series, each storing 1500 daily-data points. I'm quite new to the TimeSeries module and am a little confused as to how I would best go about it. So a few basic questions:
1) Should I use a huge numpy array of 1000x1500 and simply feed that to the time series constructor function time_series()?
2) If I do this how will I index each series by name (eg "S&P500" or "GOLD" for example)? I know I will be able to access the array by date, but will I have to have a separate data structure to link series names with their column numbers in the large array?
3) Or should I use a structured data type as per the example given in the docs(http://pytseries.sourceforge.net/core.timeseries.html)? If so, how do I append series one by one to the timeseries, since I don't want to create a massive non-numpy structure to feed to the time_series() constructor in one shot?
Advice on where I can get some good examples for financial markets and timeseries module in general would also be appreciated.
Thanks.
Solution
For help on this, have a look at Quantlib which is a useful library for financial work, and which has an active users mailing list.
In addition, read this book review for a book entitled Financial Modeling in Python.
OTHER TIPS
1) i once implemented a pagerank algorithm for a small set (~10K) of linked documents, therefore in during the calculation a 10Kx10K matrix had to be handled, for which the numpy
array implementation was - as i recall - blazingly fast.
2) imho storing metadata like series name externally does not hurt that much ..
3) i haven't worked with scikits.timeseries, but would definitely look into it; as far as i can see, the project lives around the same scipy orbit as numpy ..