-
10-12-2019 - |
题
我有一个问题,我不确定。请注意,我不是SharePoint搜索专家。 我们有多个内容来源。一切都在罚款。
最近,我们添加了一个巨大的新站点(1.5TB!),遗憾的是,我们需要抓取内容(只需每隔几个月一次,因为数据是静态的每隔几个月。我的问题是,一旦我开始这个网站的完整爬网,它会杀死另一个内容源的增量爬网,该爬网必须每10分钟运行一次,并且通常在大约5分钟内完成。
我们有其他服务器可以添加更多的爬网组件,但据我所知,这不会解决问题,它只通过释放爬行时间来缓解一点。
如何指定服务器1应对内容源1和服务器2负责其余的。那可能吗?另请注意,我们有快速搜索,但我不确定它是否可以解决问题。任何反馈都会得到理解。
解决方案
This isn't really possible without having multiple farms each with a separate Search Service Instance.
I would suggest you basically split out your 1.5TB of content into a separate farm with a dedicated Search instance. You can even use HOSTS entries on the crawl servers so they have their own local dedicated WFE servers for the crawl process to iterate through:
Farm 1 - Main Farm (business as usual) Farm 2 - 1.5TB Content and Specialist Search Farm
You can then federate the services you need between the two.
The alternative (which is a bit less documented but arguably more ideal) is to use the "Request Management" service which allows you to route specific service requests to specific servers. I don't really know too much about how that all works though..
其他提示
I would do the following:
- priorize the incremental crawl with "high"
- priorize the new content source with "low"
- limit the requests to the new site in crawler impact rules
- start the crawl on friday night and pause it on monday - repeat until you have the main chunk indexed (use powershell to automate)
With this the impact should not be that hard.