Pregunta

Where could I find a code (javascript would be the best) to strip out the www and second-level domain names from URLs?

Example:

www.ynet.co.il -> ynet (stripped 'co.il' - two tokens)
www.nike.com -> nike (stripped 'com' - one token)

etc

As a second best - the full list of second-level domains (preferably in CSV or any other format) will be welcomed as well.

¿Fue útil?

Solución

If you use Java, Guava can help you here.

You can use InternetDomainName.topPrivateDomain() together with publicSuffix() to solve your problem.

Guava (as well as Mozilla/Firefox, Chrome and Opera) use the Public Suffix List for this functionality (the raw data is here).

tld.js is a JavaScript library that uses that data as well.

Otros consejos

https://gist.github.com/2428561 something like this? Search for 'javascript url parser' in google

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top