Question

I'm trying to write a .NET/MS SQL application that will download daily weather data from a web service and I'd like to store/cache that data in the local database.

The way it's going to be used is:

  • The user will access my web page, enter start and end date range
  • The code will retrieve the data from the database and any missing data from a web service.
  • The weather data for the requested day range will be presented to the user.

The web service I'm using to download the weather data also accepts from/to dates as parameters.

Because I can't know which dates the users will enter, I may end up with segmented data cached in the database.

There are several problems I'm trying to solve:

  1. How can I properly determine the consecutive range of data (from/to) that I need to download based on the already downloaded (if any) data segments stored in the database?
  2. Ideally I'd like to make a single web service call rather than multiple ones.
  3. Once the data is received, how do I fill in the blanks in the database, discarding the information for the dates already present?

So far I've tried writing an algorithm for the items 1 and 2, but the date range arithmetic got complex and I couldn't full get it to work. Item 3 should be trivial.

Is there already an algorithm that solves a similar problem?

Was it helpful?

Solution

You can use an interval tree to store the time periods that you have cached - this will let you quickly retrieve the cached time periods that overlap with a user's query. You can then use this time period library to determine what queries need to be submitted to the web service in order to fill the user's query by taking the differences between the queried interval and the cached intervals.

Once you fill the user's query then you should reorganize the interval tree to merge any overlapping or abutting time intervals (e.g. if you had previously cached the intervals [2, 6] and [12, 16] and the user queries [4, 14], then you should submit a [6, 12] query to the web service, remove the [2, 6] and [12, 16] intervals from the interval tree, and add a [2, 16] interval in their place). You may also want to avoid caching small intervals if you want to avoid making too many queries to the web service (e.g. if the user wants [1, 2] and you've cached [1, 1.1], [1.3, 1.4], [1.5, 1.6], [1.8, 1.9], then you'll be making 4 queries to fill the user's query), either by discarding small intervals or by always retrieving a minimal interval so that none of your cached intervals is "too small" (e.g. if the user queries [1.4, 1.5] you would submit a [1, 2] query to the web service).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top