packet sniffer - trace back to where content originated

Question

Qwest (who were bought by CenturyLink) were an ISP (and CenturyLink still is). Some company might, for example, use them as a hosting ISP, so that requests such as HTTP requests to www.example.com are actually handled by one of Qwest's/CenturyLink's servers, so that if you're downloading from www.example.com, you're actually downloading from a server owned by Qwest/CenturyLink, the IP address of which also belongs to Qwest/CenturyLink.

Akamai are a company who provide infrastructure for organizations that want to make it possible for lots of people to download material; the organization will arrange that the domain name for their server will actually resolve to an IP address for Akamai, so that if you think you're downloading from www.example.com, you're actually downloading from one of Akamai's servers, the IP address of which also belongs to Akamai.

Therefore, the source IP address of an inbound packet that's part of a reply to a download request (such as an HTTP GET request) from www.example.com might be the IP address of a host belonging to Qwest or Akamai - and that really is the host where the traffic really started!

If it belongs to Qwest, Example Inc.'s Web server is probably actually stored on a Qwest machine with a Qwest IP address, not a machine owned by Example Inc. and using an IP address in a range owned by Example Inc. (Example Inc. might have better things to do with their resources than manage servers and a private range of IP addresses).

If it belongs to Akamai, it has a copy of the material to download, provided by Example Inc., who contracted with Akamai to provide content caching.

About all you can do to determine whose material is actually being downloaded would be to look at, for example, the DNS requests done by the client to determine the IP address (so that you see "www.example.com" in the DNS request, rather than just seeing the Qwest/Akamai/whatever IP address returned for the DNS request) and/or, at least for HTTP, the "Host:" header in an HTTP 1.1 request (which would also contain a domain name).

And, yes, in the general case, where you start capturing at some arbitrary point in the download process, it is not always possible to figure out the "original" source of the content being downloaded.

The Host: header will appear only in the initial HTTP request, so if you weren't capturing traffic at the time the initial HTTP request was made, or if the download isn't the result of an HTTP request, you're out of luck.

A DNS request would be made before the download starts - and, as DNS resolvers can cache results of DNS requests, it might have been made a significant amount of time before the download starts - so, again, if you weren't capturing at the time the DNS request was made, you're out of luck.