Pregunta

I have 2 datasets in CSV format. Both contain an Unix timestamp. One contains business related data and the other contains weather data.

What I want to do is import the weather data inside of the business related data by closest timestamp. Since none of the timestamps exactly match I want to have each business have the data for the closest weather record.

I need to find the minimum difference between the two timestamps for every record and insert the data for that case.

¿Fue útil?

Solución

According to me, the best possible method is to

  • Upload both the tables to a database
  • create a date and time dimension and have date_id and time_id updated to both the tables.
  • update the primary key of your business data to the closely related weather data. an Example shown below

    Update weather_data set weather_data.id = (SELECT id from business_data where business_data.timestamp_column <= weather_data.timestamp_column ORDER BY business_data.timestamp_column DESC LIMIT 1);

This will get you the business data primary key into weather data which make it easy for you to join.

Good Luck with this one!

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top