Question

I've recently started my role as a DBA and my initial task has been to take ownership and redeploy a database provided by a 3rd party to help us conform to GDPR legislation.

As a consequence, I've not really been able to look at much beyond the project itself. I'm now at the stage where my database is ready to go live, although there are a few caveats.

My existing code for some aspects creates large temporary tables (70m+ rows) and I know my dev database has grown over this project.

My question is for this and subsequent projects working with large datasets, in instances of temporary storage, should I use the temp database or create staging tables within the database I migrate to and drop them?

The reason I ask this is my live tempdb currently is less than 1gb in size whereas dev reached around 30 and should I continue with temporary tables, I would want to grow tempdb in advance to prevent waits on autogrowth.

Was it helpful?

Solution

Based on the additional details provided, I would recommend using staging tables versus temporary tables.

The benefits are:

  1. If there is an error, you can view the data as it was mid process for troubleshooting more easily. You can also more easily control the existence/deletion data (in regards to GDPR) rather than it being in the sometimes nebulous tempdb.
  2. You can simplify the code by utilizing persistent tables and just truncating them after each successful processing.
  3. You won't be thrashing your tempdb and possibly affecting other things relying on the tempdb at the same time.
  4. As Jonathon mentioned, it gives you a more stable tempdb size that isn't inflated just because of a 1x a week process.
  5. The schema more clearly illustrates how the data is used, instead of doing a lot of data processing under the veil of tempdb. This is more of a personal preference, but I think it would be more clear for future maintenance / maintainers.
  6. I can't think of any huge downsides to this approach, unless you have way better disks for tempdb and will suffer by using the drives for a normal database.

Downsides provided by David Browne:

  1. "The downsides are that writing to user databases generates more IO, as Tempdb is allowed to not flush tables and log records to guarantee recoverability. And this is exacerbated if your database is in Full recovery, and further if using an AG"

If you go this route, a common choice is to also segregate the data to a separate staging database altogether, which might help mitigate some of the negatives as this wouldn't have to be in an AG or in Full recovery mode necessarily.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top