
My question is regarding use of indexes.

  1. Should I start indexing right from the start or when performance problem arises?

  2. We can also create temporary index while executing a query. What are the pros and cons of such techniques?

Was it helpful?


Should I start indexing right from the start or when performance problem arises?

Indexing strategy tends to evolve as usage patterns emerge. That said, there are also strategies and design guidelines that can be applied up front.

  • Choose a good clustering key. You can usually determine the appropriate clustered index at design time, based on the expected pattern of inserts to a table. If a compelling case emerges for a change in the future, so be it.

  • Create your primary and other unique constraints. These will be enforced by unique indexes.

  • Create your foreign keys and associated non-clustered indexes. Foreign keys are your most frequently referenced join columns, so index them from the start.

  • Create indexes for any obviously highly selective queries. For query patterns you already know will be highly selective and likely to use lookups rather than scans.

Beyond the above, take a gradual and holistic approach to implementing new indexes. By holistic, I mean assess the potential benefit and impact to all queries and existing indexes when evaluating an addition.

A not uncommon problem in SQL Server circles is overindexing, as a result of guidance from the missing index DMVs and SSMS hints. Neither of these tools evaluate existing indexes and will merrily suggest you create a new 6 column index rather than add a single column to an existing 5 column index.

-- If you have this
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC

-- But your query would benefit from the addition of a column
CREATE NONCLUSTERED INDEX [IX_MyTable_MyIndex] ON [dbo].[MyTable] 
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC

-- SSMS will suggest you create this instead
CREATE NONCLUSTERED INDEX [IX_MyTable_AnotherIndexWithTheSameColumnsAsTheExistingIndexPlusCol6] ON [dbo].[MyTable] 
    [col1] ASC
    , [col2] ASC
    , [col3] ASC
    , [col4] ASC
    , [col5] ASC
    , [col6] ASC

Kimberly Tripp has some excellent material on indexing strategy that while SQL focused is applicable to other platforms. For the SQL Server folk, there are some handy tools for identifying duplicates like the example above.

We can also create temporary index while executing a query. What are the pros and cons of such techniques?

This usually only applies for rarely run queries, typically ETL. You need to assess:

  1. Does the time taken to create the index reduce the execution time of the query.
  2. Does the maintenance overhead of leaving the index in place outweigh the time taken to create/drop when it's needed.


There's really risks associated with both approaches:

Option a) Index from the start, but not realize you have created a number of indexes which are never used. These add some overhead (most noticeably to queries that modify data, but also with optimization of SELECT statements trying to identify the best index).

You will need to discipline yourself to identify indexes no longer being used and try and remove them (PostgreSQL can do this; unfortunately MySQL by comparison is very weak at this out of the box.)

Option b) Don't add indexes until people start complaining, or your diagnostic tools trigger that certain queries are slow and could be improved.

The risk that you introduce is that you don't have a big enough time window between when you notice you need the index, and when you have to add it.

PostgreSQL does support building indexes CONCURRENTLY, which does reduce some of the stress from this sudden-index-add-requirement, but there are some caveats noted in the manual.

Option (b) tends to be my preference, but I think a hybrid of both options is probably the best solution. It has to do with your confidence level as to whether you think an index will actually be used.

What makes this a particularly complex discussion is that it is usually easy to change indexes, but it is harder to change schema. I do not want to promote the delayed reaction of b as an excuse to be reckless.

In addition to Mark's answer

You can get a feel by having realistic test data at expected quantities. I've seen many, many (too many) cases where a query runs OK with a 1000 rows but not the million in production.

If you can, work on a copy of production later on,

Of course, I've seen the odd problem only in production because of usage patterns when everything else is identical

Temporary indexes? Outside of ETL load patterns, if you need them once you'll need them again. Don't forget: an index create/drop is a write and is logged = more load

Just to add a few things.

  • Temporary indexes are a terrible idea.. unless the index is on a temp table.
  • Indexes take up much more dataspace (as well as other overhead) than people realize. Therefore, create them conservatively.

This is my approach.

  1. Similar to Mark, make indexes where they make sense, but don't overdue it.
  2. You don't have to wait until performance is slow to create new indexes. Whenever you write new SQL, run a query plan (preferably against your prod database). You should be able to see if a new index is required.
  3. Don't be afraid to put > 0 or > "" in your where clauses for unused columns.

    1. Ie, lets say you have an index on A,B,C, and D. However, you only have information A,B,D. There is no reason you can't do-
    select * from blah 
    where A="one" 
    and B="two" 
    and C>=""     --to match index
    and D="four"
    --This will use your existing index. No need to create a redundant one.

I will try to answer only the first question. If you can estimate even roughly from the beginning how many records you'll have in your tables after a certain amount of time, than I'd say it's better to start from the beginning to design some indexes. Try to use some test tools or test scripts that will automate as many calls as possible for the application calls that you think will be most often used and you'll see what table scans can be avoided from the beginning.

It will be a guess work at the beginning, but in time, as you have proper usage statistics, you'll have a clearer image.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top