Question

The question says it all.

Example: I'm planning to shard a database table. The table contains customer orders which are flagged as "active", "done" and "deleted". I also have three shards, one for each flag.

As far as I understand a row has to be moved to the right shard, when the flag is changed.

Am I right? What's the best way to do this? Can triggers be used?

I thought about not moving the row immediately, but only at the end of the day/week/month, but then it is not determined, in which shard a rows with a specific flag resides and searches have to be done always over all shards.

EDIT: Some clarification:

In general I have to choose on a criterum to decide, in which shard a row resides. In this case I want it to be the flag described above, because it's the most natural way to shard this kind of data. (In my opinion) There is only a limited number of active orders which is accessed very often. There is a large number of finished orders, which are seldom accessed and there's a very huge number of data rows which are almost never accessed.

If I want to now where a specific data row resides I dont have to search all shards. If the user wants to load an active order, I know already in which database I have to look.

Now the flag, which is my sharding criterium, changes and I want to know the best way to deal with this case. If I'd just keep the record in its original database, eventually all data would accumulate in a single table.

Was it helpful?

Solution

Sharding usually refer to separating them in different databases on different servers. Oracle can do what you want using a feature called partitioned tables.

If you're using triggers (after/before_update/insert), it would be an immediate move, other methods would result in having different types of data in the first shard (active), until it is cleaned-up.

I would also suggest doing this by date (like a monthly job that moves anything that's inactive and older than a month to another "Archive" Database).

I'd like to ask you to reconsider doing this if you're doing it to increase performance (Unless you have terabytes of data in this table). Please tell us why you want to shard and we'll all think about ways to solve your problem.

OTHER TIPS

In my opinion keeping all active record in single shard may not be a good idea. In such sharding strategy all IOs will be performed on single database instance leaving all other highly underutilized.

Alternate sharding strategy can be to distribute the newly created rows among the shards using some kind of hash function. This will allow

  • Quick look up of row
  • Distribute IO on all the shard instances.
  • No need to move the data from one shard to another (except the case when you want to increase the number of shards).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top