Question

Where could I find a code (javascript would be the best) to strip out the www and second-level domain names from URLs?

Example:

www.ynet.co.il -> ynet (stripped 'co.il' - two tokens)
www.nike.com -> nike (stripped 'com' - one token)

etc

As a second best - the full list of second-level domains (preferably in CSV or any other format) will be welcomed as well.

Était-ce utile?

La solution

If you use Java, Guava can help you here.

You can use InternetDomainName.topPrivateDomain() together with publicSuffix() to solve your problem.

Guava (as well as Mozilla/Firefox, Chrome and Opera) use the Public Suffix List for this functionality (the raw data is here).

tld.js is a JavaScript library that uses that data as well.

Autres conseils

https://gist.github.com/2428561 something like this? Search for 'javascript url parser' in google

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top