Is a Non-Clustered index needed when referencing a small table with a Foreign Key

https://dba.stackexchange.com/questions/286033

16-03-2021
|

Question

I'm not a DBA, excuse me if this question sounds dumb. Hope I can get help from an experienced DBA, because I'm questioning this for a long time.

Assume there are 2 tables, one table with a lot of records, and one table is just static data with 2 records (for example: a Yes/No table).

As example take these 2 tables:

Table_Yes_No (only 2 records: Yes and No. Of course they have an incrementing clustered PK)
Table_Form (Very big table with lots of records, and one column has FK to the PK of Table_Yes_No).

Now my question; Is it worth putting an index on the FK (reference from Table_Form to the Table_Yes_No table)? The column will only contain 1 or 2 (but not sorted, because no index). Is it worth indexing such a FK?

An example in SQLFiddle: http://sqlfiddle.com/#!18/51614/3/0

That column will be queried anyway, the question is will the index help the performance or not. The small table is static data, will never be changed. The big table will be queried heavily, and there will be also a lot of CRUDs on it.

Solution

Short answer

No.

Long answer

No, and adding the index could be detrimental to performance.

Something that factors heavily into whether or not a secondary/non-clustered index is even used is how selective it is (and the searches you are trying to perform are). Y/N has two values - it's ability to be selective is going to depend on the proportion of Y to N. If they are evenly split, and there is no rhyme/reason to Y/N with relation to the clustered index, they will not be selective and the query optimizer will almost always ignore that index.

The potential slowdown occurs if you issue a delete on the larger table or an update against that Y/N column. There's no efficient way to update the secondary index as it is organized by Y/N - so the only option is to basically scan the entire index looking for the record to update/delete. Depending on the number of rows in the main table and the size of the clustered index, this could be an issue.

Caveats

If there are few Ys or few Ns, and you need to locate those records quickly, you can use a filtered index to create a very small index that contains only records for the sparsely populated value. This can be beneficial and has a much lower footprint than a traditional non-clustered index would in that situation.

If you're just trying to enforce Y/N (or some other easily understood code set), that can be handled easily through a check constraint, which will have a lower overhead than the FK.

ADD CONSTRAINT CK_<ColumnName>_Is_YesNo CHECK (<ColumnName> IN ('Y','N'))

OTHER TIPS

Placing the foreign key will cause internally a key lookup into the one page that will be required by your Yes/No table. This will be trivial for the most part and will not consume much cpu, but the code path of a simple check constraint will likely be faster and wont involve any locks or latches.

For the scenario you have where the values of the column is basically a boolean value, you should really use just a check constraint.

Also, I don't see much reason for the Yes/No table to exist if it is essentially a primitive.

Indexes on foreign keys are prescriptive. IE they are not there to optimize SELECT queries, they are there to prevent excessive locking and scanning when updating the primary key table.

So an exception to the general rule is fine for foregn keys to tables which are essentially static (ie updated, if at all, during maintence windows), supporting the index with a foreign key is optional. And indexes on such a foreign key column would follow the normal rules for any non-clustered index, and would depend on the usefulness of the index for the workload.

Note that an index on a low-cardinality fk might be very useful. Eg

select count(*) from T where SomeFlag='Y'

would benefit greatly from an index on SomeFlag.

If you will delete (or update the id) records from small table index helps.

If you dont index big table delete will lock and scan the whole table.

If there is an index delete statement will also lock the table but scan the index which is cheaper.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange