Determine main article image - HTML Agility Pack

Question 1

In fact, there is no defined assumption to achieve what you want in a generic way.

At first you have to bear in mind that the websites are different and can change at any moment so any try to get an indefectible algorithm is a waste of time in most of situations.

In this case, if you have just a few number of websites to parse, then you can just figure out the current content disposition pattern of each one and parse it with HTML Agility Pack, for example:

24matins: There's a div with a class named "post-header", which first <img> is the main article image, then with HAP you could write:

var web = new HtmlWeb();
var doc = web.Load("http://www.24matins.fr/the-walking-dead-saison-4-le-deces-de-ce-personnage-ne-sera-pas-anodin-40685");
var img = doc.DocumentNode.SelectSingleNode("//div[@class='post-header']/img");
Console.WriteLine(img.Attributes["src"].Value);

lasemaine..: There is a unique div with its class named "illustrations", so:

web = new HtmlWeb();
doc = web.Load("http://www.lasemainedansleboulonnais.fr/actualite/la_une/2013/04/04/article__20_ans_prison_meurtre_de_sa_mere_boulogne.shtml");
img = doc.DocumentNode.SelectSingleNode("//div[@class='illustrations']/img");
Console.WriteLine(img.Attributes["src"].Value);

Also, I would suggest you to use the RSS Feed of the sites to get relevant information. Generally, they include the picture of the articles and are more likely to have recognizable pattern as you can check out in www.24matins.fr/feed/rss-toutes-actualites.

Hope it helps.

Question 2

You may scan HTML content of given URL for social websites' meta tags. For example for Facebook it would be:

<meta property="og:image" content="_here_is_URL_of_main_article_image_" />

But like natenho said there is no one and sure way that will always work.