What is the point of If-Unmodified-Since/If-Modified-Since? Aren't they superseded by ETags?

https://stackoverflow.com/questions/2126807

22-09-2019
|

Question

There seem to be two distinct ways to implement conditional requests using HTTP headers, both of which can be used for caching, range requests, concurrency control etc...:

If-Unmodified-Since and If-Modified-Since, where the client sends a timestamp of the resource.
If-Modified and If-None-Modified, where the client sends an ETag representation of the resource.

In both cases, the client sends a piece of information it has about the resource, which allows the server to determine whether the resource has changed since the client last saw it. The server then decides whether to execute the request depending on the conditional header supplied by the client.

I don't understand why two separate approaches are available. Surely, ETags supersede timestamps, since the server could quite easily choose to generate ETags from timestamps.

So, my questions are:

In which scenarios might you favour If-Unmodified-Since/If-Modified-Since over ETags?
In which scenarios might you need both?

Solution

I once pondered the same thing, and realized that there is one difference that is quite important: Dates can be ordered, ETags can not.

This means that if some resource was modified a year ago, but never since, and we know it. Then we can correctly answer an If-Unmodified-Since request for arbitrary dates the last year and agree that sure... it has been unmodified since that date.

An Etag is only comparable for identity. Either it is the same or it is not. If you have the same resource as above, and during the year the docroot has been moved to a new disk and filesystem, giving all files new inodes but preserving modification dates. And someone had based the ETags on file's inode number. Then we can't say that the old ETag is still okay, without having a log of past-still-okay-ETags.

So I don't see them as one obsoleting the other. They are for different situations. Either you can easily get a Last-Modified date of all the data in the page you're about to serve, or you can easily get an ETag for what you will serve.

If you have a dynamic webpage with data from lots of db lookups it might be difficult to tell what the Last-Modified date is without making your database contain lots of modification dates. But you can always make an md5 checksum of the result rendered page.

When supporting these cache protocols I definitely go for only one of them, never both.

OTHER TIPS

There is one rather big difference: I can only use ETags if I have already asked the server for one in the past. Timestamps, OTOH, I can make up as I go along.

Simple reason: backward-compatibility.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow