Domanda

I have 2 datasets in CSV format. Both contain an Unix timestamp. One contains business related data and the other contains weather data.

What I want to do is import the weather data inside of the business related data by closest timestamp. Since none of the timestamps exactly match I want to have each business have the data for the closest weather record.

I need to find the minimum difference between the two timestamps for every record and insert the data for that case.

È stato utile?

Soluzione

According to me, the best possible method is to

  • Upload both the tables to a database
  • create a date and time dimension and have date_id and time_id updated to both the tables.
  • update the primary key of your business data to the closely related weather data. an Example shown below

    Update weather_data set weather_data.id = (SELECT id from business_data where business_data.timestamp_column <= weather_data.timestamp_column ORDER BY business_data.timestamp_column DESC LIMIT 1);

This will get you the business data primary key into weather data which make it easy for you to join.

Good Luck with this one!

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top