Question

I am doing some web scraping for a research project and have some bandwidth limitations that I am hitting. Due to the nature of my work, I require this to be done through a web browser control(geckofx for csharp). Because of this, I cannot control images that get loaded.

My question is, in windows, is there any way to force certain images to not load. I know web pages can be blocked via hosts file, but it does not work on specific images on a page.

Ideally such a tool would have regex/wildcard for specifying blocked image sets.

Was it helpful?

Solution

You can use Fiddler (or Fiddler Core) as proxy and you should be able to do pretty much anything with each request. In your case you may want to issue additional HEAD request in the script for image requests and see if size is acceptable, if not fail the original request...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top