How best to hold 1000 different data series using TimeSeries module in Python?

https://stackoverflow.com/questions/1894981

19-09-2019
|

Question

I want to create a massive TimeSeries object which will hold 1000 different financial markets data series, each storing 1500 daily-data points. I'm quite new to the TimeSeries module and am a little confused as to how I would best go about it. So a few basic questions:

1) Should I use a huge numpy array of 1000x1500 and simply feed that to the time series constructor function time_series()?

2) If I do this how will I index each series by name (eg "S&P500" or "GOLD" for example)? I know I will be able to access the array by date, but will I have to have a separate data structure to link series names with their column numbers in the large array?

3) Or should I use a structured data type as per the example given in the docs(http://pytseries.sourceforge.net/core.timeseries.html)? If so, how do I append series one by one to the timeseries, since I don't want to create a massive non-numpy structure to feed to the time_series() constructor in one shot?

Advice on where I can get some good examples for financial markets and timeseries module in general would also be appreciated.

Thanks.

Solution

For help on this, have a look at Quantlib which is a useful library for financial work, and which has an active users mailing list.

In addition, read this book review for a book entitled Financial Modeling in Python.

OTHER TIPS

1) i once implemented a pagerank algorithm for a small set (~10K) of linked documents, therefore in during the calculation a 10Kx10K matrix had to be handled, for which the numpy array implementation was - as i recall - blazingly fast.

2) imho storing metadata like series name externally does not hurt that much ..

3) i haven't worked with scikits.timeseries, but would definitely look into it; as far as i can see, the project lives around the same scipy orbit as numpy ..

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow