what do you call a spidering technique where spider visits all links in the first level, and all links in second level

StackOverflow https://stackoverflow.com/questions/1636098

  •  06-07-2019
  •  | 
  •  

Question

i forgot the name to a case where a web spider will

first visits all links it sees on the first level. then visits all links it sees on the second level. and so on...

there is a name for this technique....i forgot...

anyways, this is very exhaustive and obviously inefficient. Is there a better way ?

I remember reading a paper in summer about efficiently crawling web pages (DSL or something like that i dont know what that stands for)....in summary it discussed method for "Determine which URL's are likely to hold relevant information and which URL's are to be ignored like register, new account link..etc"

i didnt read it in too much detail, if any of this stuff rings a bell please post a link.

Was it helpful?

Solution

Sounds like 'breadth first search', as opposed to 'depth first search'. In the 1st one you examine all your options laterally, so to speak, whereas in the latter you drill as deep as you can on every path first. That's AI terminology, not sure if it's in vogue with web tool designers. Anyway, BFS consumes a lot of memory but is usually employed when you want to find an 'optimal result', something (in your terms) at the shallowest level possible, whereas DFS tends to use a lot less memory but may miss better solutions.

If you are just trying to catalog all the links, use DFS. If you are trying to find something at the shallowest link depth, use BFS.

OTHER TIPS

Breadth-first search.

In graph theory, breadth-first search (BFS) is a strategy for searching in a graph when search is limited to essentially two operations: (a) visit and inspect a node of a graph; (b) gain access to visit the nodes that neighbor the currently visited node. The BFS begins at a root node and inspects all the neighboring nodes. Then for each of those neighbor nodes in turn, it inspects their neighbor nodes which were unvisited, and so on. Compare it with the depth-first search.

http://en.wikipedia.org/wiki/Breadth-first_search

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top