Question

given the URL of a well known company (eg http://mcdonalds.com/), how would you automatically and reliably find the company name (in this case "Mc Donalds")?

Thanks

Edit: someone voted to close this question, so maybe I need to explain the motivation. I have a large list of company URLs and I want to find data about each company using Google Maps. And searching Google Maps with the company name works much better than the URL.

Removing 'http' and 'com' does work in a lot of cases, particularly for well known companies, but not all. I found the whois records were not very helpful.

I was hoping there was some kind of public database matching companies to URLs, but haven't come across one so far.

Was it helpful?

Solution

You would need to create your own Lookup Table: You would have to try and parse this information from the html at the URL for themost accurate data, eg: get the Html page Title, or look for the Copyright message?

OTHER TIPS

Quite probable they will have it in the <title/> element. Parse this and compare it to the website's domain. If there is a significant overlap, it is your match. If not, try some heuristics on the title (like name is everything before >> or such).

If it is a larger company, then you could also be lucky looking at the NIC entry (aka Whois) for their domain.

Whois database may be of some help, though there are always edge cases that you will have to handle with more effort.

If you want to be accurate, I would say amazon mechanical turk.

Try to use cURL and DOMDocument.

loadHTML($result); $title = $dom->getElementsByTagName("title"); echo $title->item(0)->nodeValue; ?>

Take a look at the meta tag

You could use the whois information. There should be libraries to let you do that in a clean way. You didnt mention what type of technology you'll be using...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top