Question

I am looking for a web Crawler that works correctly with MOSS 2007 and SP 2010. Basically, I want this Crawler to grab the Sharepoint (MOSS 2007 and SP 2010) site and store it locally. This web Crawler (Also called "web robot" or "web spider") grab web pages (including resources like images and CSS…etc), download them locally and adjust any resource hyperlinks to point to the locally downloaded resources.

I found some resources, samples and tools. But I didn’t see any one specific for SharePoint. Besides, the main problem that I am facing is how to specify the depth level of Crawling?

For example: I have a SharePoint Web Application that consists of 3 SiteCollections and each SiteCollection contains 9 levels of SubSites. Assume that I only want to crawl 5 levels of the SubSites of the second SiteCollection. Well, of course the URL of that exact SiteCollection should be provided, but then, How can I crawl only 5 levels of the SubSites?

I am not asking about developing a web part for this purpose. I am asking about: How can I create a web application that is capable of accessing and crawling any online SharePoint Site (MOSS 2007 or SP 2010) according to depth level?

I would greatly appreciate any input !

Was it helpful?

Solution

Have you had a look at SharePoint Workspace 2010 to save local copies of content?

Licensed under: CC-BY-SA with attribution
Not affiliated with sharepoint.stackexchange
scroll top