Domanda

The title sums it up.

I've learned and always heard that indexes in tables improve CRUD operations. A developer that I met last weekend told me that he does not like Indexes because they are bad - yes, "bad" does not clarify anything but we did not had time to discuss it further (we were at a party).

Anyway, maybe because my lack of experience, I do not know of a scenario where Indexes can cause troubles during CRUD operations, but maybe there are a few out there. I'm asking this question to know if there are any...

È stato utile?

Soluzione

Well, I think you have some mixed concepts:

  1. An index improves performance of READ OPERATIONS ( those of SELECT ) while increase the processing time of INSERT/UPDATE OPERATIONS ( So they don't improve all CRUD operations, as you've heard ). As each time you insert a new row, you should update the index, if you have too much indexes you are increasing the time of insertions, and sometimes also the updates ( if the update involves something indexed.

  2. An index uses space, a lot of space if you have a lot of rows.

  3. It's not a problem for the system to know which is the best index to use, I think this is not a real performance killer, but you should look for redundant indexes, as they're using space and time of inserts / updates.

    For this, you should know how your DB engine works with indexes, in MySQL if you have an index over name, surname fields and other over name, this latter is redundant, as it's included in the first one ( because it appears in the same order, surname only is not included ), as an example of redundant indexes.

    Also, you should test how your DB is going to interpret the query and which indexes are going to be used ( in MySQL, you can use Explain... and the query you are testing )

  4. Finally, indexes are one of the most important features of databases, indexing can't be 'bad' by itself, and normally the problem appears when you forget to add some particular index, and not the excess of indexes, but it could happen.

Altri suggerimenti

Having too many indexes can indeed cause performance problems.

If many indexes have very similar statistics it is possible that the optimizer cannot reliably decide on the most useful choice of indexes. (I learned this when working with a database where almost every column was indexed.)

In that case, we reduced the number of indexes significantly by removing indexes on columns that would seldom be used. This greatly improved the performance of our queries.

In addition the too many indexes caused (1) more space to be used for little benefit and (2) consumed more server resources to keep all the excess indexes updated.

So, yes, indexes can really help your performance, but you need to be reasonable in how many you create. Focus on the indexes that seem most useful to you.

Additional Information: Many database vendors include tools to help you analyze the value and usage of the indexes. For example:

  1. MySQL - At http://dev.mysql.com/doc/refman/5.7/en/using-explain.html discusses how to use EXPLAIN to determine the usage of indexes.
  2. Postgresql - At http://www.postgresql.org/docs/9.1/static/monitoring-stats.html outlines the usage of statistics in views such as pg_stat_user_indexes.
  3. Microsoft SQL Server - At https://msdn.microsoft.com/en-us/library/ms188755.aspx in the sys.dm_db_index_usage_stats view which reports statistics such as seeks, scans, updates, and latest usage.

Trying to stay database-neutral:

Reading, filtering

Indexes radically speed up ordering and filtering operations on a table - often by a factor of 1000 times or more. Compared to a phone book, an index lets you look up a single person up directly, because it's alread sorted alphabetically. If the phone book were just an unordered list of a million names with their phone numbers, you'd spend a month to find a single phone number.

Inserting

As a natural consequence of keeping an index organized, it adds overhead to any change you perform on the data. To continue on the phonebook analogy, if you add a name, you're going to have to insert the name in the correct alphabetically order, and this takes more time/work than just adding the record to the end of the table.

Updating

An index will vastly improve the speed at which you find your data, but if you change a value in an indexed column, the data will have to physically move in the table in order to maintain the correct order.

Deleting

Again, the index will help you find the record very quickly, compared to looking for the correct record in the entire table. Normally, a delete won't reorganize the index - it'll just leave a hole where the row was, though this may be different between database servers.

In summary

Changing data in an indexed table will take longer, while selecting data will be quite much faster with proper indexing. Like @ypercube says, over-indexing not only slows down change operations, it also forces the server to choose the correct index, which will take a long time if there are a thousand choices to go through.

There are fringe cases where you may not want to index a table: For instance, when you need to insert a large number of records, and you have no interest in filtering or ordering those records once you read them. I would, for instance, consider this for a fact table used (non-incrementally) for an OLAP cube - it gets populated once, and read in its entirety once without any particular sort order.

Lots of good answers already. I just want to add a rule of thumb and a worst case scenario.

Rule of thumb: if an index is not used frequently by SEEK operation, it can be considered "bad", and should be revised or removed.

Worst scenario: a clustered index in sql server is composed of GUID (non-sequential) column, and thus frequent inserts may cause physical data reallocation.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top