문제

I happen to be using innodb, read-committed.

My simple question is this relative to a transaction:

I have a table (TreeNodeId) which holds a set of 4 different nodekeys, that represent all extant nodes in my system that relate to available paths to webpages. Each key represents an item in the database, and each row in the table represents various combinations in which items are used.

At the beginning of a transaction, based on the items being changed, I make a single query for all rows in TreeNodeId that reflect some extant combination of my one or 2 items.

Will this single query be internally consistent, even if it fetches 10,000 rows? Is it possible for the db query set to get the first 100 rows, and then for some other simultaneous transaction to commit new or deleted rows that would cause the remaing results to be inconsistent?

Andy

도움이 되었습니까?

해결책

If you isolation level is read 'committed' it will only return results that have been 'committed' by the transaction log. So if you start a query that is in isolation level 'committed' at that point in time the sql transaction log will only give you transactions that had posted to it's log as committed. If in the middle of the select someone posts records they will be seen as 'uncommitted' at that point in time till they end their operation and will be 'committed'. However even if you change the level to 'uncommitted' you should not get data as it is in mid stream, you should get what is available to the engine at the moment you began your operation according to the transaction logs.

Committed versus uncommitted will get records in the system at the moment of select that are there based on your select. So if I had say 3,000,000 records and 200,000 records inserting but they were committing one at a time and only 100,000 had committed and 100,000 were aware of operation in the logs but not committed yet.

Committed would give me 3,100,000 and Uncommitted would give 3,200,000. However there are schools of thought and I just got into a discussion yesterday with someone on this.... Uncommitted will give you the uncommitted results and are known as 'dirty reads' in that you are reading logs that are not set yet(you rebel). You are saying "Hey database I don't care what you got incoming that is finalized I WANT IT NOW." When you say committed you are saying: "Database I only want qualified data, if something is not finalized I don't want it."

Advantages with each:

  1. Uncommitted you will not LOCK anything. You are basically saying to the system: "Don't lock anything out, just let me go through the system freely getting what is there and I don't care if you change something. I want it at moment of operation." If something is trying to insert or update when you perform this it WILL NOT LOCK IT.

  2. Committed will not lock anything except that which is in process to commit till your operation has been completed. You are safe in knowing your data returned is finalized but your run the risk of BLOCKING transactions trying to insert or post. Your are essentially telling the database: "Wait for me to finish before continuing your commits on tables I am accessing. I need my data accurate so hands off till I am done". This will potentially lock data while it is performing the reads on a table you are gathering from. This is not that common as most selects are near instant but on huge systems that are transactional based on posting thousands of records a second it is a BIG CONCERN.

Honestly in my discussion I favored uncommitted and the other person favored committed. I argued it is far more acceptable to get dirty data than stop production inserts. They argued that phantom reads and other instances were worse. This is an opinion and SQL systems are designed around inserts and selects but seldom can you do both exceptionally fast without taking a little away from the other. My answer if you want accurate reporting is do nightly backups, SSIS packages, binary collections, or something similar in an isolation level such as snapshot or committed and put that data somewhere. Let that data have been set in a way that we know it is finalized and it is locked so it may not be changed later and report off of that. Don't report off of production data hot and make it a point to tell everyone to do that. That is bad practice in and of itself to tell people to report off of live data performing inserts and updates in real time.

Will it hurt if you are a small mom and pop store with only 5 or 10 people using the database, probably not. Will it hurt if you are little bigger and have 50 people accessing the same database but it is about 100 gigs and semi transactional in that you get trickle's of data during the day. Still probably not. Will it hurt if you have 200 people and multiple servers and databases and a main transactional database brain storing the composite of all the data. ABSOLUTELY, don't read from a main production database with intense operations if it's main purpose is to get data to store.

EDIT to further point from real world example:

That is why usually at the top of most operations where I am not using table variables (declare @Table table) I set this: "set transaction isolation level read uncommitted". Will I be using this intensely every time I query? LOL, I hope not. In fact Full disclosure it may NEVER EVER help me from this point on because I isolate my data a lot with temp tables for huge transaction reporting. But I will not be getting yelled at by others I have a long running transaction blocking their inserts. You will also see a lot of people do this: "select * from table (nolock)" I Generally give code like this to lesser query designers as it embeds the nolock hint with the query. If I tell everyone to do this they will make it policy.

You do not have to do this and in fact some people will maybe follow me and claim this is wrong and post their side. I do it MOSTLY FOR PRODUCTION PROTECTION and anyone that tells me that is wrong I would like to hear why they like to lock tables and report off of them in production versus getting their data in or updated in real time first. I would have a hard time going to a manager and saying: "You know that huge account you were waiting to post 2 million records on and know the instance it was done. Well John down the hall really wanted to run this query that takes an hour to run because it was sloppily designed. He chose to use committed and is hitting some of the tables doing inserts so we are getting occasional locks. Well I think it is more important he get his report than we get business." I wonder what the manager would tell me back?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top