Question

I have decided to improve my well-being and in case being successful write a note about this and share it with my peers for free and try to help them improve there well-being as well. But on this road, I need foundational advice from the database practitioners.

Background Description:

In my research, I need to collect a lot of different types of time-series (GDP, Real GDP, Nominal GDP, Consumption, Investment, Percapita GDP, Number of Hours Work, Unit Labor Cost and many more). Right now, I am following and collecting about 155 time-series. Until now, I have been updating them manually as soon as an update was published on statistical web pages. I have been doing this using Excel spreadsheet (like, downloading new excel and then copying and pasting the necessary data point into my excel). However, this is a very daunting task, and it is prone to a lot of errors. Checking and finding the error is another nightmare. So, in total, maintaining this Excel database (updating checking and documenting) takes about 10-12 hours per week approximately.

My Task:

  1. I want to create a database where I will have all these and many more variables (I want each variable to have a name, description, source description (ideally with URL included), date of last update etc.) stored.
  2. I also want to be able to update the database automatically from downloaded Excel or CSV file (These files come in different formats from various webpages. Some are structure vertically some horizontally).
  3. I also want to have some error checkers for easier debugging the base.
  4. The series are of different frequencies (some are daily, some weekly, some monthly and some quarterly and yet some annual). I want to be able to easily convert from daily to monthly or from monthly to quarterly or vice versa using my own predetermined formula. (being it average or median or whatever the current task requires)
  5. I also want to then later to be able to easily query any variable or several of the variables and create a dashboard in various programs like MATLAB or Python or Julia to visualize dynamics and use them for regression or model estimations in these programs.

Question

For all the tasks described above which database management program would you recommend (ideally, it would be better to be it free and open source)

P.S. I tried writing MATLAB script for automatic updates, but it's very inconvenient.

P.P.S This is a cross-post from Cross Validated in hope to receive the answer somewhere.

Thanks, Giorgi.

As an example of how my data set looks like please see the following screenshot:

enter image description here

Was it helpful?

Solution

Your root question is subjective and it's a little hard to fully envision what your data looks like without some examples, but it sounds like a Relational Database Management System (RDBMS) could work for you based on the type of querying you'll be doing.

If you solely want something free than PostgreSQL. If you want something with the most features built into it out of the box and are ok with paying then Microsoft SQL Server. Both can house your data, have Views (and other types of objects) which can save your queries (predetermined formulas) for conversion, and support importing data from CSVs and other files if needed as well (though I recommend you do as much of your data gathering and importing into the database via a non-database language like Python).

Your additional requirements for creating dashboards and other ad-hoc querying will be supported by either of these databases as well.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top