Are there performance problems in querying non varbinary(max) fields in a table containing varbinary(max) data?

StackOverflow https://stackoverflow.com/questions/2911677

سؤال

I created a table to insert all the documents of my application. It is a simple table (let's call it DOC_DATA) that has 3 fields: DOC_ID, FileSize, Data. Data is varbinary(max).

I then have many tables (CUSTOMERS_DOCUMENTS, EMPLOYEES_DOCUMENTS, ...) that contain other data (like "document description", "Created by", "Customer ID" ...). My case is not exactly like this, anyway by writing this example I can express myself better. All these tables have a FK to DOC_DATA.DOC_ID).

When the user searches for a customer document he will run a query similar to this:

select CD.*, DD.FileSize
from DOC_DATA DD
join CUSTOMERS_DOCUMENTS CD ON CD.DOC_ID = DD.DOC_ID

My question is: will the performance of this query be bad because we are reading also a field from a table that is potentially huge (the DOC_DATA table can contain many GB of data) or this is not a problem?

The alternative solution is to put the FIleSize field in all the main tables (CUSTOMER_DOCUMENTS, EMPLOYEES_DOCUMENTS, ...). Of course a join has a little impact on the performance, now I am not asking about to join or not to join in general, but to join or not to join a HUGE table while I am not interested in the HUGE fields.

Please note: I am not designing a new system, I am maintaining a legacy system, so here I am not discussing which is the best design in general, but just which is the best option in this case.

هل كانت مفيدة؟

المحلول

I see no reason why the performance of your query would suffer due to the presence of those large columns. Performance issues would come up when you read that data --specifically, when you require the database engine to return the document, but you are (of course) not doing so in the query.

Internally, for the various yada(max) data types, SQL stores a 16 or so byte pointer (or reference marker, forwarding record, or whatever they call it) in the row, and the actual data is stored in a separate set of pages. Thus, if you're not reading that column, those pages do not need to be accessed, and you don't incur the disk I/O hit.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top