문제

I have several questions concerning crawlers.

  1. Can I create a crawler that works purely on web? I mean, a crawler that can be launched or stopped from an admin page of a web-project.

  2. What is the most convenient language to write a crawler? I was planning to write it with c#.

  3. The most important one: How do the crawlers work? I mean, I know that you create them through the use of HttpWebRequest and HttpWebResponse, and I guess that after each page visit, the crawlers will come back and the code will evaluate the result and then create a queue to send the crawler to other websites. So basically if this information is true, considering that I will create the crawler by using a web-project, should I keep the page always up and how big will be the burden of the crawler for the server? Will it slow down the server or it's a relatively small work for it?

I know, there are many questions here and I will really appreciate the answers :)

도움이 되었습니까?

해결책

1) Absolutely a crawler could work purley on the web. Your crawler could be either an ASP.NET application, or your administration page could start or stop a task (the web crawler) on the server.

2) VB.NET or C# works. They both have extensive libraries for working with the web.

3) I'd imagine what you're looking for is a recursive function. First, choose a page to start with on the internet (that contains a lot of links). For each link within the page, run the crawler's main method again. Do this over and over. You'll probably want to limit how "deep" down to crawl. I'd imagine you'd want to do some work within each page also.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top