This may be a strange question but here goes. I have a script that reads several sources (RSS) and then compiles a list of articles and sends an e-mail.

I use the pubDate tag

<pubDate>Thu, 27 Apr 2006</pubDate> 

and then select all data that is published yesterday with -1 day in php.

I use UTC and my question is when should I run the script to make sure that I get everything that was in fact published. Is it me that is confused or is there a perfect time not to miss anything?

For instance, if I run the script 08:00 UTC there may be locations where data is not published yet, and perhaps one hour later stuff will still be on the same day but not retrieved when I run the script the next day.

Thanks for any input on schedules etc.

有帮助吗?

解决方案

In practice, time zone offsets range from UTC-12:00 to UTC+14:00. Since each time zone has it's own concept of a day, if you want to cover the entire world you'll have to run your script until after 12:00 PM (Noon) UTC.

In other words, to cover any concept of May 1st, you'll have to wait until Noon UTC on May 2nd.

You might also want to a allow a few minutes for clock discrepencies. 12:05 PM UTC would work well.

HOWEVER - in many cases, you don't want to process the entire world at once. If you can separate the data by it's time zone, you may instead want to run a series of separate smaller batches after midnight in each time zone.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top