How to extract the title image of the web page
-
28-06-2021 - |
質問
I want to extract the title image of a web page using C# in ASP.NET. I checked the windows and document objects but they don't have a property such as title. So searching for method to extract the title image like in the page tab of the Chrome.
解決
using (WebClient client = new WebClient())
{
Byte[] favico = client.DownloadData("http://msite.com/favico.ico");
}
That's using WebClient.DownloadData
. You can also use WebClient.DownloadFile
if you're looking to store it.
A further bullet-proofed approach would be to download the index page and use an HTML parser to look for the <link>
tag that specifies where the icon is supposed to be (could also be applied to apple-touch-icon or otherwise).
BTW, the tags I believe you're looking to parse are:
<!-- StackOverflow's implementation: -->
<link rel="shortcut icon" href="http://cdn.../favicon.ico">
<link rel="apple-touch-icon" href="http://cdn.../apple-touch-icon.png">
<!-- Google's implementation: -->
<meta content="/images/google_favicon_128.png" itemprop="image">
<!-- Facebook's implementation: -->
<link href="http://static.ak.fbcdn.net/.../q9U99v3_saj.ico" rel="shortcut icon">
他のヒント
There is no such a thing as "title image" in HTML specifications. The icon you see in the tab or near the URL in some browser is specified using <link rel="icon"/>
construct:
<link type="image/x-icon" href="/images/favicon.ico" rel="icon" />
IE may rquire you to use a slightly different syntax:
<link type="image/x-icon" href="/images/favicon.ico" rel="shortcut" />
Parse the page - and retrieve the value of href
attribute - this is the path of the icon.
Note also, that IE version 8 and below ignore this line completely and instead look for file favicon.ico
in the root of the site. See this somewhat old article for more information on IE.