I have url data in a table. I would like to create a view that shows the second level (sld) and top level domain (tld) as well as the subdomain. How can I extract this in ANSI SQL?
The database I am using supports only ansi sql and doesn't have cool functions such as reverse.
Here is the data:
TLD = -- The top-level domain (.com, .org, .info, .us)
SLD = -- The second-level domain (twitter, yahoo, facebook, google) second part of URL
SUBDOMAIN = -- The subdomain domain (www, search.google, search.espn) first part of URL // tricky
Here is the logic I am using. But I am unable to get the subdomain properly. I would like to reverse and get the remainder after extracting TLD, and SLD, but Vertica doesnt support reverse function.
Here is the query and sample data (note: SPLIT_PART splits the string at the character specified):
select COALESCE(SPLIT_PART(URL, '.', 3), SPLIT_PART(URL, '.', 2)) as tld,
SPLIT_PART(URL, '.', 2) as sld,
SPLIT_PART(URL, '.', 1) as subdomain from URL_table
The table has 2 columns, date and URL
Here are the example URLS:
search.mywebsearch.com (TLD = com, SLD = mywebsearch, subdomain = search)
search.earthlink.net
topix.com
main.welcomescreen.intrepid.com
ad.yieldmanager.com
google.com
news.google.com