Question

I used sniffex.c as my starting point, and I've spent a few months working on getting the packet sniffer working the way I would like. It is a good tool for providing a summary of traffic flow in and out of each computer on my network, but I find myself wanting a little more inforation about where inbound traffic originated. If I do a whois on the src_ip of a sample inbound packet, most of the time I get information about a host owned by either Qwest Communications Company, LLC or AKAMAI TECHNOLOGIES INC, which does not really provide me with the information I am interested in.

At this point I am interested in tracking where the data came from such as youtube or espn... How can this be done?

A reverse dns lookup sounds like what I am looking for, but if I take a src_ip that I received a decent chunk of data from and put it in one of the online reverse dns search forms all I get is that it is owned by qwest.

Edit #1:

OK, I now have a better idea of how to ask this question, thanks to Guy Harris' answer below. As he stated there should be a "Host:" line in the ascii data within each packet, which should provide me with more direct info about what the source of this data is at a higher level. Now how do I get to that data? Is parsing the ascii text the best approach, or are there pre existing functions to get at this data?

Edit #2:

Well, parsing either the payload or header ascii seems to be a dead end. I found the source for a very useful libpcap application here. This program prints all of the above to a log file. Looking over this data I find that very few packets have a "Host:" field. Obviously only TCP port 80 packets, and then only the first packet in the series. Even then, I found the only ones with this host field were served up by the web server on one the boxes on my network.

So is what I am asking completely impossible to figure out now that the content of many different websites may be cached on a single host?

Was it helpful?

Solution

Qwest (who were bought by CenturyLink) were an ISP (and CenturyLink still is). Some company might, for example, use them as a hosting ISP, so that requests such as HTTP requests to www.example.com are actually handled by one of Qwest's/CenturyLink's servers, so that if you're downloading from www.example.com, you're actually downloading from a server owned by Qwest/CenturyLink, the IP address of which also belongs to Qwest/CenturyLink.

Akamai are a company who provide infrastructure for organizations that want to make it possible for lots of people to download material; the organization will arrange that the domain name for their server will actually resolve to an IP address for Akamai, so that if you think you're downloading from www.example.com, you're actually downloading from one of Akamai's servers, the IP address of which also belongs to Akamai.

Therefore, the source IP address of an inbound packet that's part of a reply to a download request (such as an HTTP GET request) from www.example.com might be the IP address of a host belonging to Qwest or Akamai - and that really is the host where the traffic really started!

If it belongs to Qwest, Example Inc.'s Web server is probably actually stored on a Qwest machine with a Qwest IP address, not a machine owned by Example Inc. and using an IP address in a range owned by Example Inc. (Example Inc. might have better things to do with their resources than manage servers and a private range of IP addresses).

If it belongs to Akamai, it has a copy of the material to download, provided by Example Inc., who contracted with Akamai to provide content caching.

About all you can do to determine whose material is actually being downloaded would be to look at, for example, the DNS requests done by the client to determine the IP address (so that you see "www.example.com" in the DNS request, rather than just seeing the Qwest/Akamai/whatever IP address returned for the DNS request) and/or, at least for HTTP, the "Host:" header in an HTTP 1.1 request (which would also contain a domain name).

And, yes, in the general case, where you start capturing at some arbitrary point in the download process, it is not always possible to figure out the "original" source of the content being downloaded.

The Host: header will appear only in the initial HTTP request, so if you weren't capturing traffic at the time the initial HTTP request was made, or if the download isn't the result of an HTTP request, you're out of luck.

A DNS request would be made before the download starts - and, as DNS resolvers can cache results of DNS requests, it might have been made a significant amount of time before the download starts - so, again, if you weren't capturing at the time the DNS request was made, you're out of luck.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top