Cluster design: if I expect to insert data into several tables every week, is it a bad idea to cluster them?

https://dba.stackexchange.com/questions/274194

06-03-2021
|

Domanda

I have some tables that I join that I think would be perfect for putting in a cluster. But I also expect to insert data into them every week. Not a huge amount of data- something like 5 - 20 records a week. My understanding is that clusters are good for tables that you join and select from, but no so good if you plan to use DML statements on them.

My question is: when Oracle documentation says that clustering isn't efficient to use with DML statements, are they referring to updating and deleting hundreds of records? Or would a small amount of records being inserted also fall under this guideline? My question is essentially, does the scale of DML statements impact the efficiency of clustering? Or is it more of a binary situation. Like, if i was updating one record every day, should I avoid putting my tables in a cluster?

Soluzione

I have yet to see a real world user scenario where the benefit (saving a bit of disk or I/O or block access) of using a cluster instead of just regular tables (or IOTs) with joins is so significant that it worths the hassle of dealing with it.

5-20 records per week: that is nothing. Paper and pencil can do that.

FYI: The data dictionary tables use a few clusters for identifiers. These identifiers never change. They are inserted, deleted, but never updated. In some environments, 5-20 records are inserted/deleted in a matter of seconds or minutes (due to dynamically creating and dropping objects) without causing any problem. So 5-20 records per week will not be a problem. The question is: do you really want to use something that is almost never used, that may not even improve performance noticeably (or even make it worse), but requires extra attention.

Altri suggerimenti

caveat emptor

Whenever you have a schema design idea, run benchmarks to (dis)prove its usefulness.

For me, using a non-standard schema design needs to be proven to have a significant benefit prior to implementing.

For your really-really tiny data amount, I expect you to save only a few jiffies per year.

Again, run benchmarks.

TL;DR When to use a Clustered Table? never (unless proven otherwise.)

Clustering and partitioning are to create locality of reference in huge datasets.

Clustering is storing together all rows associated with each key value, and depending on the RDBMS may be applied either table by table as an index where the rows are the leaves or multi-table, where the data for each key value in multiple tables is kept together. With Clustering, the table is still huge.

Partitioning is putting the table in different spaces, so it acts like many small tables. For instance, in the exchanges we partitioned by trading day. This is great for speeding query and churn, as old partitions are quiescent. It is also very handy for efficiently purging data and recycling the space to serve a new partition key value, when partitioning on date.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange