سؤال

Short question is on the title: I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.

Long Question for those willing to know the context: I am working on a huge set of data like

{
_id:ObjectId("azertyuiopqsdfghjkl"),
stringdate:"2008-03-08 06:36:00"
}

and some other fields and there are about 250M documents like that (whole database with the indexes weights 36Go). I want to convert the date in a real ISODATE field. I searched a bit how I could make an update query like

db.data.update({},{$set:{date:new Date("$stringdate")}},{multi:true})

but did not find how to make this work and resolved myself to make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value. The query use the _id so the default index is used.

Problem is that it takes a very long time. I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added. I also set an index on a relevant field to process the database chunk by chunk. Finally I ran several concurrent mongo clients on both the server and my workstation to ensure that the limitant factor is the database lock availability and not any other factor like cpu or network costs.

I monitored the whole thing with mongotop, mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time. I am a bit disappointed mongodb does not have a more precise granularity on its write lock, why not allowing concurrent write operations on the same collection as long as there is no risk of interference? Now that I think about it I should have sharded the collection on a dozen shards even while staying on the same server, because there would have been individual locks on each shard.

But since I can't do a thing right now to the current database structure, I searched how to improve performance to at least spend 90% of my time writing in mongo (from 70% currently), and I figured out that since I ran my script in the default mongo shell, every time I make an update, there is also a getLastError() which is called afterwards and I don't want it because there is a 99.99% chance of success and even in case of failure I can still make an aggregation request after the end of the big process to retrieve the single exceptions.

I don't think I would gain so much performance by deactivating the getLastError calls, but I think itis worth trying.

I took a look at the documentation and found confirmation of the default behavior, but not the procedure for changing it. Any suggestion?

هل كانت مفيدة؟

المحلول

I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.

You can use db.getLastError({w:0}) ( http://docs.mongodb.org/manual/reference/method/db.getLastError/ ) to do what you want but it won't help.

This is because for one:

make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value.

When using the shell in a non-interactive mode like within a loop it doesn't actually call getLastError(). As such downing your write concern to 0 will do nothing.

I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added.

I did tell people when they asked about this stuff to add those fields incase of movement but instead they listened to the guy who said "leave them out! They use space!".

I shouldn't feel smug but I do. That's an unfortunately side effect of being right when you were told you were wrong.

mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time

That's because of all the movement in your documents, kinda hard to fix that.

I am a bit disappointed mongodb does not have a more precise granularity on its write lock

The write lock doesn't actually denote the concurrency of MongoDB, this is another common misconception that stems from the transactional SQL technologies.

Write locks in MongoDB are mutexs for one.

Not only that but there are numerous rules which dictate that operations will subside to queued operations under certain circumstances, one being how many operations waiting, another being whether the data is in RAM or not, and more.

Unfortunately I believe you have got yourself stuck in between a rock and hard place and there is no easy way out. This does happen.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top