Question

I have a collection of approximately 625,000 objects.

Nightly via cron, I retrieve an updated file (tsv) of this data from a data provider and need to re-import them into my mongodb.

What's the best way to do this without interrupting the website/service and its users which are consuming the data? Is mongoimport --upsert directly on the collection the best way? It's just so slow.

Should I be thinking about other methods involving copies of my collection?

Was it helpful?

Solution

Is mongoimport --upsert directly on the collection the best way? It's just so slow.

This will work, but in my experience mongoimport has been slow and unwieldy. It also has some other serious limitations: no ability to track progress, no spot for checking/validation, no log of what's it doing.

Should I be thinking about other methods involving copies of my collection?

The alternate method here is to write a script that does this. In most languages this is pretty trivial.

This will give you the ability to track progress and handle bad data. If you find the import particularly slow, you may also be able to fork the process and split up the work.

About speed

Do remember that you're updating 625k objects. If you can get 1k updates/second that's still 10 minutes to update. If you run mongostat or check your monitoring during the import process, you should get some idea of how much work is being done.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top