Question

Q: is it possible to manipulate http request header or using any other technique in C# when making request (to servers like yahoo.com/cnn.com) using C#, so that the returned web page text(stream)'s size can be greatly reduced - a simplified webpage without all other extra scripts/image/css? or even better can I just request a sub-section of the webpage of my interest to be downloaded only? I just need the responded page to be minimized as much as possible so that it can be downloaded as fast as possible before the page can be processed later.

Was it helpful?

Solution

It really depends on the site and services it provides and configuration it has. Things that may help to look for (not a complete list):

  1. API exposed that let you access data directly. E.g. XML or JSON type response.
  2. Compression - your client has to request via appropriate HTTP headers, e.g. Accept-Encoding: gzip, deflate, and needless to say know how to process response accordingly. SO thread on doing this in C#.
  3. Requesting mobile version of site if site supports such a thing. How site exposes such version really depends on the site. Some prefix their URLs with m., some respond to User-Agent string, some use other strategies...
  4. Use HTTP Range header. Also depends if site supports it. MSDN link for .NET API.

OTHER TIPS

Have a play with tweaking some of the browser capabilities in your HTTP request header, see here. Although your response to this will vary from site to site but this is how a client tells the server what it is capable of displaying and dealing with.

There is no way to ask server to render different amount of data outside of what server supports via C# or any other language. I.e. there is no generic mechanism to tell server "don't render inline CSS/JS/Images" or "don't render ad content" or even "just give me article text".

Many sites have "mobile" versions that will have potentially smaller page sizes, but likely contain different or less information than desktop version. You should be able to request mobile version by picking different url or specifying "user agent" corresponding to a phone.

Some sites provide data as RSS feed or some other means to obtain data automatically - you may want to check with each side.

If you know particular portion of the page to download you may be able to use range header for GET request, but it may not be supported by dynamic pages.

Side notes: - most sites will server CSS/JS as separate files. - make sure to check license to see if there are any limitations on each site.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top