我有一个问题,我不确定。请注意,我不是SharePoint搜索专家。 我们有多个内容来源。一切都在罚款。

最近,我们添加了一个巨大的新站点(1.5TB!),遗憾的是,我们需要抓取内容(只需每隔几个月一次,因为数据是静态的每隔几个月。我的问题是,一旦我开始这个网站的完整爬网,它会杀死另一个内容源的增量爬网,该爬网必须每10分钟运行一次,并且通常在大约5分钟内完成。

我们有其他服务器可以添加更多的爬网组件,但据我所知,这不会解决问题,它只通过释放爬行时间来缓解一点。

如何指定服务器1应对内容源1和服务器2负责其余的。那可能吗?另请注意,我们有快速搜索,但我不确定它是否可以解决问题。任何反馈都会得到理解。

有帮助吗?

解决方案

This isn't really possible without having multiple farms each with a separate Search Service Instance.

I would suggest you basically split out your 1.5TB of content into a separate farm with a dedicated Search instance. You can even use HOSTS entries on the crawl servers so they have their own local dedicated WFE servers for the crawl process to iterate through:

Farm 1 - Main Farm (business as usual) Farm 2 - 1.5TB Content and Specialist Search Farm

You can then federate the services you need between the two.

The alternative (which is a bit less documented but arguably more ideal) is to use the "Request Management" service which allows you to route specific service requests to specific servers. I don't really know too much about how that all works though..

其他提示

I would do the following:

  1. priorize the incremental crawl with "high"
  2. priorize the new content source with "low"
  3. limit the requests to the new site in crawler impact rules
  4. start the crawl on friday night and pause it on monday - repeat until you have the main chunk indexed (use powershell to automate)

With this the impact should not be that hard.

许可以下: CC-BY-SA归因
scroll top