Question

I have a question which I'm not sure can be achieved. Take note that I'm not a SharePoint Search specialist. We have multiple content sources. Everything is running fine.

Recently, we added a new site which is massive (1.5TB!), and unfortunately we need to crawl the content (just once every few months as the data is static). My problem is that once I start a full crawl of this site, it kills the other content sources incremental crawls which HAS to run every 10 minutes and normally finishes in about 5 minutes.

We have other servers available where I can add more crawl components, but as far as I know, that won't solve the problem, it will just alleviate it a bit by bringing down the crawl times.

How can I specify that Server 1 should be responsible for Content Source 1 and Server 2 the rest. Is that possible? Also note that we have Fast Search, but I'm not sure if that can solve the problem either. Any feedback would be appreciated.

Was it helpful?

Solution

This isn't really possible without having multiple farms each with a separate Search Service Instance.

I would suggest you basically split out your 1.5TB of content into a separate farm with a dedicated Search instance. You can even use HOSTS entries on the crawl servers so they have their own local dedicated WFE servers for the crawl process to iterate through:

Farm 1 - Main Farm (business as usual) Farm 2 - 1.5TB Content and Specialist Search Farm

You can then federate the services you need between the two.

The alternative (which is a bit less documented but arguably more ideal) is to use the "Request Management" service which allows you to route specific service requests to specific servers. I don't really know too much about how that all works though..

OTHER TIPS

I would do the following:

  1. priorize the incremental crawl with "high"
  2. priorize the new content source with "low"
  3. limit the requests to the new site in crawler impact rules
  4. start the crawl on friday night and pause it on monday - repeat until you have the main chunk indexed (use powershell to automate)

With this the impact should not be that hard.

Licensed under: CC-BY-SA with attribution
Not affiliated with sharepoint.stackexchange
scroll top