Question

We are transferring 3 million archived invoices from SFTP server to our first-ever SharePoint Online Record Center. Moving the files into SharePoint is not the problem, getting the files folder-partitioned and crawled by Search is. It will take too many months to process all 3 million through the Drop Off Library/Content Organizer at the apparent throttled-rate of 5-6000 files per day. The PDFs are uniformly small (60-180 KB); and this Record Center is dedicated to a single content type: Invoices. How should we do this at this scale? Thank you.

Was it helpful?

Solution 2

We ended up using Dell Boomi to move the files from SFTP to the target record library, bypassing the Drop Off Library. We were able to get Boomi to create folders in the library at 2500 items, same as Content Organizer would -- except Boomi could not top off a folder from the last session, when sessions were interrupted for any reason. This is not an issue for us. We ran these sessions during off-hours to minimize impact on SharePoint during business hours.

Once the 3 million file backlog was delivered (which still took more than a month) we reinstated the Content Organizer and the Drop Off library to handle new daily documents. We learned through Microsoft Support that Microsoft throttles all workflows (Content Organizer uses the same resource that SharePoint Designer 2010 and 2013 workflows use) to 4000-5000 items per day per workflow, which works for us on average throughout the year. I added a reusable SPD 2010 workflow to my custom content type to populate metadata and activated it on the Drop Off Library -- it does not appear to interfere with, or count against the daily budget of 4000-5000 files per day moving from the Drop Off to the destination record library.

Finally, I created a custom Search Results query at the site collection level to scope to this one site, including in the Drop Off library. I had to make the Drop Off library searchable -- it is turned off by default. Working as expected.

UPDATE July 7, 2020: Since about June 4, 2020, our daily throttle budget has dropped from 5000 to 500 items per day, which is too low for our purposes. Microsoft Support confirmed today that this is a new policy since May. We are waiting to hear if this is temporary, or if there is a workaround.

OTHER TIPS

When dealing with 3 mill invoices I would bypass the Content Organizer and implement the logic in a powershell script, like partion the files by month and year

Licensed under: CC-BY-SA with attribution
Not affiliated with sharepoint.stackexchange
scroll top