What's a good use case for SELECT * in production code?

https://dba.stackexchange.com/questions/253873

18-02-2021
|

Question

Out of habit, I never use SELECT * in production code (I only use it with ad-hoc scrap queries, typically when learning the schema of an object). But I ran across a case now where I'm tempted to use it but would feel cheap if I did.

My use case is inside a stored procedure where a local temp table is created that should always match the underlying table used to create it, whenever the stored procedure runs. The temp table is populated much later on, so a quick hack to create the temp table without being verbose would be SELECT * INTO #TempTable FROM RealTable WHERE 1 = 0 especially for a table with hundreds of columns.

If the consumer of my stored procedure is agnostic to dynamic result sets, then are there any issues with me selling my services to SELECT *?

Solution

I generally abhor SELECT * in production code, and I've been in a situation where its use led to massive amounts of rework later. Your case does look like a fair use of it though.

The place where I find SELECT * to be a must - and its evil cousin "INSERT INTO tbl" without a column list - is in an archiving situation, where rows are being moved to another table that must have the same structure.

INSERT INTO SalesOrderArchive  -- Note no column list
SELECT *
  FROM SalesOrder
 WHERE OrderDate < @OneYearAgo

DELETE FROM SalesOrder
 WHERE OrderDate < @OneYearAgo

If a new column is added to SalesOrder in the future, but not to SalesOrderArchive, the INSERT will fail. Which sounds bad, but it's actually a really good thing! Because the alternative is much worse. If all the columns were listed on the INSERT and the SELECT, then the INSERT would succeed, and so would the following DELETE (which is effectively "DELETE *"). Production code that succeeds doesn't get any attention, and it may be a long time before someone notices that the new column is not being archived, but being silently deleted altogether.

OTHER TIPS

What’s a good use of select * in production?

IMO only things like this:

create table #foo(a int, b int, c int, d int)
...
select * from #foo

with q as
(
  select a, b, c
  from ...
)
select *
from q

ie when the * is bound to an explicit column list that's declared in the same batch, and is used just to avoid repeating the column list multiple times.

When the * refers to a table or view, there are some nasty complexities about cached metadata, eg here. And it doesn't really save you typing, as SSMS will allow you to drag-and-drop the complete column list.

A valid use of SELECT * occurs when introduced by EXISTS.

WHERE EXISTS (SELECT * FROM ...

Like all uses of SELECT *, this has the unwelcome (and unnecessary here) side-effect of preventing WITH SCHEMABINDING from being added if the code is inside e.g. a function.

See also:

Bad habits to kick : using SELECT * / omitting the column list by Aaron Bertrand
When Select * Doesn’t Matter by Erik Darling.

I'll be bold and say it: There are use cases for SELECT * in production code. Any time you can say "I want any changes to the table to be immediately reflected in a change to my result output" is a case for doing it.

Let me give a few examples:

Use Case #1 - Make a view that mirrors the main table, except it filters out the sensitive corporate-only data.

In this case, you want the view to SELECT *. The output is supposed to be a reflection of the main table. Having the output layout match the table layout is a feature, not a bug.

...

Use Case #2 - Make a stored proc that copies some records you're about to update into a temporary holding table, in case the process experiences problems (it's kinda flaky).

In this case, having output that doesn't match the table itself is terrible - it means you might not be able to use the data in the temporary holding table to get back to where you started from.

What’s a good use of select * in production?

Really, there isn’t. In development in ways you mentioned (select top n *) it makes sense but otherwise develops bad habits and could lead to issues.

You've touched on the main point on why you shouldn't use it, and that is the structure of the actual table could change. Creating a temp table like you mentioned is all good, but where it could get you is later in your proc, after you populate the temp table with some INSERT commands, you return the result set with SELECT *. Now, your end application would retrieve columns it isn't expected.

You could use SELECT ... INTO #TEMP FROM ... to create the temp table on the fly as well. It's kind of hard to know which is best for your situation.

Most other issues with SELECT * don't seem to matter for your use case

Creating a view (could break when the table changes)
Binding problems
Table scan (since you actually do want all columns)
Retrieving unnecessary columns (since you seem to want them all, again) and their ordinal position

My answer to your question is "Most/Many use cases"

Your use cases highlights why * isnt horrible. Your schema will change. Why set yourself up to touch every single SP because you need to add a field?

If you are adding a field, why would my UI care? If you are deleting a field, it doesnt matter if you use * or a CSV, you are introducing a breaking change.

"If the consumer of my stored procedure is agnostic to dynamic result sets, ..."

Here in lies the true problem w/ using select *. If your consumer is written poorly, then you could run into problem. For instance, if you are formatting Fields[13] as a date...

Some cases where I would shy away from select *:

Latency mitigation - My consumer is distant and they only need a relatively small subset of fields in the table(s)

Joins - Complex query involving multiple tables. Consumer probably doesn't need all the fields from all the tables. In fact, there are duplicate key columns that should probably be omitted.

I know my consumers write poor code - If you are developing a public API, I suppose its possible you would want to honor OpenClose and make a new api method each time the field set changes. To me, this seems like a bad approach to said problem...

As with anything w/ programming, there is a time and place. There is always the ivory tower set. Listen to their thoughts w an open mind. Then make up your own mind.

it depends on the failure mechanism you prefer.

If, when a new column is added, it can always be safely ignored (eg: select firstname,lastname from customers where id=@x), then never use select * - the more columns you touch, the less chance that a helpful index can be found/created.

If when a new column is added, silent discarding of it would be bad (any type of data replication) then use an unspecified select (insert into archiveTable select * from theTable where isOld=true) The code failure here is not a bug, it is a feature - especially in the case where you control both the database and the code reading the table.

Reversing the failure mechanisms here is bad -- you don't want your name lookup to fail if you add a new favourite colour to the person.

you don't want to ignore archiving the person's favourite colour if the column was added after you wrote the archival code.

A note about joins - sql server will skip reading any columns not used later

select firtname,email from (select * from profile where isArchived=false)p join (select * from emailaddresses where mostRecentSend < ago(6days))e on p.id=e.profileId

does not touch anything other than the firstname, id and email columns.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange