Approaches to scan and fetch all DNS entries

Question

Downloading databases like RIPE's or ARIN's will not get you the reverse DNS entries you want. In fact, you'll only get the Autonomous Systems and the DNS servers resolving these ranges. Nothing else. Check this one: ftp://ftp.ripe.net/ripe/dbase/ripe.db.gz

Reverse DNS queries will get you only a fraction of all the DNS entries. In fact, no one can have them, as most domain names don't accept AXFR requests, and it could be considered illegal in some countries. To get access to the complete list of .com/.net/.org domain names you might be ICANN or maybe an ICANN reseller, but you'll never get other TLDs which aren't publicly available (several countries).

Then, the best possible approach would be to bruteforce all reverse-ip-resolution + become an Internet giant like google to set up your own public DNS's + try to perform an AXFR request on every domain name you're able to detect.

Mixing all these options are the only way to get a significative portion of all the DNS entries, but never the 100%, and probably not more than 5 to 10% . Forget about bruteforcing whois servers to get the list of domain names. It's forbidden by their terms and conditions.

We're bruteforcing reverse-ipv4-resolution right now, because it's the only legal way to do it without being Google. We started 2 weeks ago.

After two weeks of tunning, we've completed a 20% of the Internet. We've developed a python script launching thousands of threads scanning /24 ranges in parallel, from several different nodes.

It's way faster than nmap -sL, however it's not as reliable as nmap, so we'll need a "second pass" to fill-in the gaps we got (arond 85% of the IPs got resolved on the first attempt). Regular rescanning must be performed to obtain a complete and consistent database.

Right now we've several servers running at 2mbps of DNS queries on every node (from 300 to 4000 queries/second on every node, mostly depending on the RTT between our servers and the remote DNS's).

We expect to complete the first pass of all the IPV4 entries in around 30 days.

The text files where we store the preeliminary results have an average of 3M entries for every "A class" range (i.e. 111.0.0.0/8). These files are just "IP\tname\n", and we only store resolved IP's.

We needed to configure a DNS on every server, because we were affecting the DNS service of our provider and it blocked us. In fact we performed a bit of benchmarking on different DNS servers. Forget about Bind, it's to heavy and you'll hardly get more than 300 resolutions/second.

Once we'll finish the scan we'll publish an article and share the database :)

Follow me on Twitter: @kaperuzito

One conclusion we have already got is that people might think twice about the names they put in their DNS PTR entries. You can't name an IP "payroll", "ldap", "intranet", "test", "sql", "VPN" and so on... and there's millions of those :(