Question

Where could I find a code (javascript would be the best) to strip out the www and second-level domain names from URLs?

Example:

www.ynet.co.il -> ynet (stripped 'co.il' - two tokens)
www.nike.com -> nike (stripped 'com' - one token)

etc

As a second best - the full list of second-level domains (preferably in CSV or any other format) will be welcomed as well.

Was it helpful?

Solution

If you use Java, Guava can help you here.

You can use InternetDomainName.topPrivateDomain() together with publicSuffix() to solve your problem.

Guava (as well as Mozilla/Firefox, Chrome and Opera) use the Public Suffix List for this functionality (the raw data is here).

tld.js is a JavaScript library that uses that data as well.

OTHER TIPS

https://gist.github.com/2428561 something like this? Search for 'javascript url parser' in google

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top