Proper way to add new column in database

https://dba.stackexchange.com/questions/248977

13-02-2021
|

Question

What is the proper approach when we add a new column into an existing table?

For example, I already have columns like, Foo1, Foo2, Bar1, Bar2. Now I want to add a new column called Foo3.

What is standard approach (if that is such thing) when I want to add a column with similar name?

I see 2 choices:

Create temp table with new structure where new column is next to columns with same name, copy data to new table from existing table and drop existing table and rename temp table. Bit complex process but makes database fields more readable.
Add new column in end. More simpler operation. But column names may not be clearly understood if they in very end.

To get some reference, we are using Database Projects to source control database changes and have a nicer GUI for app developers to make database changes. And we are using some kind of ORM to interact with database so nobody is querying database using database object names.

Update: I have couple of indexes on some of existing columns. But columns that have similar names (including one that I want to add) are not part of any index.

Solution

As someone who used to try to keep columns together like this, I highly suggest that you go with option #2.

The debatable upside of keeping the columns visually together (in the SSMS column list / when you SELECT * from the table) is absolutely dwarfed by the downsides of option #1.

As the tables get larger, doing this "copy and replace" operation is going to take longer and longer, and utilize more resources (I/O, memory, CPU).

Additionally, you really need to make sure you don't lose any data during the copy and replace, so using the least concurrency friendly isolation level (SERIALIZABLE) is going to be necessary. This means that access to modify the existing table will be blocked until the operation is complete. So you can add long periods of blocking to the list of downsides.

SSDT publish uses this sort of code automatically when re-ordering columns like that (from my blog post on the subject):

BEGIN TRANSACTION;

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SET XACT_ABORT ON;

CREATE TABLE [dbo].[tmp_ms_xx_Post] (
    [Id]           INT          IDENTITY (1, 1) NOT NULL,
    [CommentCount] INT          NULL,
    [PostType]     VARCHAR (10) NOT NULL
);

IF EXISTS (SELECT TOP 1 1 
           FROM   [dbo].[Post])
    BEGIN
        SET IDENTITY_INSERT [dbo].[tmp_ms_xx_Post] ON;
        INSERT INTO [dbo].[tmp_ms_xx_Post] ([Id], [PostType])
        SELECT [Id],
               [PostType]
        FROM   [dbo].[Post];
        SET IDENTITY_INSERT [dbo].[tmp_ms_xx_Post] OFF;
    END

DROP TABLE [dbo].[Post];

EXECUTE sp_rename N'[dbo].[tmp_ms_xx_Post]', N'Post';

COMMIT TRANSACTION;

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

On top of that, if you're doing this manually (not with SSDT schema compare), then it's really easy to make a mistake (put data into the wrong column, copy / paste errors, etc) - especially on a table with a lot of columns.

OTHER TIPS

The most common approach would be to just add the column to the table and accept that the columns in the table aren't going to be grouped together by function.

First off, depending on exactly what we're talking about, having a bunch of similarly named columns may imply that you need to pause and consider refactoring your data model. If table 'X' actually has columns Foo1 and Foo2 and you're looking to add a Foo3, that may imply that you need a separate X_Foo table to store all the foo's related to an X. If they're really not that closely named but you have a bunch of columns that all relate to the same sort of concept and your table has enough columns that it would be hard to tell that by scanning the column list, it may be worth creating a 1:1 child table to group together related attributes. If you have an Orders table, for example, it probably makes sense to have an Order_Fulfillment with the couple dozen attributes related to fulfilling an order and an Order_Return table with the couple dozen attributes related to processing a return rather than having a single Orders table with hundreds of columns.
Assuming the data model itself doesn't need to be adjusted, if you are worried that the column name alone isn't sufficient to tell people what it contains, one option is to add descriptions to your columns to give people browsing through the tables more context.
Even better, if you maintain a separate, up to date data model in your favorite ERD tool and make it easier for people to browse that ERD rather than poking around in the database looking for information, your ERD can present the columns in whatever logical order you want separate from the physical model.
If you design your system so that all access to the data is through views (i.e. rather than an Orders table, you have an Orders_base table and a view named Orders that everyone uses), you can get the best of both worlds by letting the column be added to the end of the physical table but having it appear in the middle of the view that everyone uses. This has a side effect of making lots of things like future data model refactoring or managing permissions on data easier.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange