Question

I finished my Economics thesis using RStudio, but my script was very slow due to massive RAM consumption during the process.


My Case

I had a massive dataset (stock prices in daily frequency for 10 years, ~700 stocks i.e. $3500\times700$) and I was picking each stock as a vector to decompose it into wavelets and CF filter (2 datasets $28000\times700$) and apply benford's law (two datasets $9\times700$).


The Problem

RStudio was storing my datasets in memory and they were consuming a significant proportion just by touching them.


Question

I started learning basic SQL commands and I found out that I can call specific columns from a certain table. Would my script be more efficient if I was calling my stocks one by one as vectors from there instead of picking them directly from RStudio? In other words, do queries call the whole dataset and then retrieve the requested values or do they follow a kind of shortcut to be memory efficient? If not, what's the purpose of using databases for domestic use?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top