Question

I am trying to find out the amount of transaction-log data generated by SQL Server 2016 during every hour or every day. By "data generated" I mean how much data (in bytes/KBs/etc) was written to disk every hour (or every day).

Is there a way to find this out?

Our database is in FULL recovery mode and we do have regular transaction log backups. So, I am on the opinion that querying the backup metadata in msdb may help to achieve that. Does it work ? Will it give me correct and reliable results?

Second option will probably be to look at the amount of read/write IO that is happening to the transaction log files. Can this one work? If yes, how can I do that? Are there any SQL Server DMVs providing such information? What about Windows tools? (such as Windows performance monitors)?

If possible at all, I would prefer the second option above because it won't require having transaction log backups. Therefore, it can be used even with SIMPLE recovery model databases. So, my question is, is it possible?

Are there any other alternatives? Such as SQL Server tools or views that are readily available?

Please note, I need this data because we are trying to estimate the amount of network IO that will be required if we create a (near) real-time replication of our databases in the cloud. So, I thought we should somehow measure the amount of IO attributed to the transaction log files. Is my assumption correct that the required IO will be equal to the amount of Io to the transaction log files? Is this how SQL Server transaction replication work ? (i.e. by sending VLFs to the replicated site)?

Was it helpful?

Solution

Lots of questions in there.

First, are we talking about transactional replication? I know you said that, but we need to be certain that you aren't talking about Always On Availability Groups, log shipping or some other solution to get a "copy" of your database. Answered: yes, transaction replication - I will refer to it as just "replication".

No, replication doesn't copy VLFs. I think it extremely difficult to get anything usable from the transaction log.

First of all, consider the replication architecture:

An Agent job, the Log Reader, reads the transaction log, generates DML statements from this and stored them in the distribution database. So first question is whether Log Reader is local or not.

Then the next Agent job, the Distributor reads these DML commands in the distribution database and applies them to each subscriber. And you probably realize, there is not a 1:1 mapping of the size of the log records (binary data) and the DML commands as results from the log records (text). And, of course, the question is where the distributor is running.

And the final question is where the subsriber is. So, you should draw a picture with the involved databases, where you have network between the databases and where the Agent jobs are running.

And there's more: Some operations aren't carried over with transactional replication. One example is index rebuild. It can generate massive amount of log records (if full recovery model), but such an operation isn't executed on the suscriber - so it is ignored by the log reader.

And since you say subset, if you mean subset of the database: How would you, from the log, determine which log records apply to tables/rows that are in this subset?

I'm mainly raising issues here, just to point out how easy you can fall into a too simplified "solution" for this which can be misleading. This MS article talks about a few options that can be usable, when you have implemented replication.

Disclaimer: I'm no replication expert, so feel free to correct me, y'all.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top