Question

I have been working on operations with strings in a recent 100 level CompSci course. I got the very "original" idea that I might write up a simple domain name generator/checker.

So I did a little homework and discovered that the various whois servers understandably limit the number of queries allowed.

So, I decided to first check for a DNS boolean. If no records are found I then check a MySQL database to make sure the same query hasn't been sent recently. If it hasn't I fire off a whois query with PHP using fsockopen. So, I was just getting ready to finish up my little script and upload it from my development server to my production server and I found some sites suggesting that various whois servers limit the queries to only 1,000.

My question:

Am I approaching this appropriately? The simple math suggests that only 10 users each checking out 10 searches each search providing only 10 results (10**3) might result in exceeding the limit and a temporary ban.

Are there any methods of doing bulk queries to the whois server?

Are other sites using some form of client-side javascript query or server-side proxy? I found another similar question here at stackoverflow suggesting that *NIX systems have access to a terminal command that has no limits. Other questions I have found deal with parsing the data - which is not a concern of mine.

I understand that this is a vague question. I do not want to inappropriately burden the whois servers. I do not expect, nor want, a ready-made code solution. A basic discussion of alternative programmatic strategies to go about this would make me a very satisfied friend :) Anyone have a keyword or two with which I can continue my research?

Was it helpful?

Solution

The whois unix command appears to be less limited (https://superuser.com/questions/452751/what-are-the-limits-of-whois-command-on-unix). It might be easiest to do what I assume whois is doing under the covers and open a tcp connection to whois.internic.net on port 43.

<?php

$fp = fsockopen("whois.internic.net", 43);
fwrite($fp, "hello.com\n");

$response = "";
while (!feof($fp)) {
    $response .= fread($fp, 8192);
}

fclose($fp);
echo $response;

?>

If that's what you're already doing, then that's probably your best bet. I'm guessing a 1,000 query limit likely refers to the use of somebody's web service that does this for you (e.g. whois.com). I think you can make a lot more queries than that if you're doing what I showed above.

(I've made a lot of guesses and assumptions here.)

P.S. A lot of good info here: http://semmyfun.blogspot.com/2010/08/how-does-whois-work-dirty-guide.html

OTHER TIPS

Even though this has already been marked as answered (I've already typed this up for another post so I might as well reuse it) :-)

As has been said most whois authorities will throttle (or even block) your traffic if they deem that your making too many requests in a 24 hour period.

Instead you might want to consider logging in to the ftp site of any of the whois providers worldwide and downloading the various bits of the database, all of them make this public data available and it's exactly the same data that companies like maxmind use for their IP to geo lookup service, and it's the same data that sits behind all these 'whois' commands everyone automates.

I currently do this with one of my own servers, which connects using the following shell script (once every 24 hours) WARNING-- This will produce almost 4 gb of data so make sure you do this on a disk with plenty of space...:

#!/bin/bash
rm -f delegated-afrinic-latest
rm -f delegated-lacnic-latest
rm -f delegated-arin-latest
rm -f delegated-apnic-latest
rm -f delegated-ripencc-latest
rm -f ripe.db.inetnum
rm -f apnic.db.inetnum
rm -f ripe.db.inetnum.gz
rm -f apnic.db.inetnum.gz
wget ftp://ftp.afrinic.net/pub/stats/afrinic/delegated-afrinic-latest
wget ftp://ftp.lacnic.net/pub/stats/lacnic/delegated-lacnic-latest
wget ftp://ftp.arin.net/pub/stats/arin/delegated-arin-latest
wget ftp://ftp.apnic.net/pub/stats/apnic/delegated-apnic-latest
wget ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest
wget ftp://ftp.ripe.net/ripe/dbase/split/ripe.db.inetnum.gz
ftp -n -v ftp.apnic.net <<END
user anonymous anonymous@anonymous.org
binary
passive
get /apnic/whois-data/APNIC/split/apnic.db.inetnum.gz apnic.db.inetnum.gz
bye
END
gunzip ripe.db.inetnum
gunzip apnic.db.inetnum

I then have a custom written program that parses the files out into a custom database structure which my servers then do their queries from.

Since all the servers mirror each others data, then you should be able to get a full data set from one server, but if not, then it wouldn't take much to modify the above shell script to download the data from the other servers, all of them respond to 'ftp.????' and have the same universal folder structure.

I can't help you with the parser however as that contains proprietary code, but the file format (esp if you get the split files) is identical to what you see in a typical whois output so it's very easy to work with.

There is a parser on google-code (That's where I got the download script) called 'ip-country' (I think) it's designed to allow you to build your own whois database, the one I've built is slightly more complicated as it's combined with other data too (Hence why my parser is proprietary)

By downloading and processing your own data like that, you get around any limit imposed by the providers, and the upshot is that it's most likely way faster to query your own data store than keep firing off requests from your server to the query servers every time someone enters an IP address.

If you're only interested in whether domains are registered, rather than in the WHOIS details, you'd have a better time using DNS servers to do your checks rather than WHOIS.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top