Is relational database able to leverage the way of consistent hashing to do the partition table?

https://stackoverflow.com/questions/7087000

26-12-2020
|

Pergunta

Assume we have a user table to be partitioned by user id as integer 1,2,3...n . Can I use the way of consistent hashing used to partition the table?

The benefit would be if the number of partitions is increased or decreased, old index can be the same.

Question A.

Is it a good idea to use consistent hashing algorithm to do the partition table?

Question B.

Any relational database has this built in supported?

I guess some nosql database already use it.

But database here refer to relational database.

I just encountered this question in an interview. In the first reaction I just answered mod by length, but then be challenged if partitioning the table into more pieces, then it will cause problems.

Solução 2

About oracle hash partition, part from oracle help doc

After some research, oracle actually do support consistent hashing by the default hash partitioning. Though how it did is a secret and not published. But it actually leverage the way HashMap, but hidden some partitions. So when you add/remove partition, very less work for oracle to adjust the data in different partitions. The algorithms only ensures evenly splitting data into partitions of numbers power of 2 such as 4. So if it's not, then merge/split some partitions.

The magic is like if to increase from four partitions to five, it actually spilts one partition into two. If to decrease from four partitions into three, it actually merges two partitions into one.

If anyone has more insight, add a more detailed answer.

Hash Partitioning Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify. The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size.

Hash partitioning is the ideal method for distributing data evenly across devices. Hash partitioning is also an easy-to-use alternative to range partitioning, especially when the data to be partitioned is not historical or has no obvious partitioning key.

Note:

You cannot change the hashing algorithms used by partitioning.

About MYSQL hash partition, part from mysql help doc

It provides two partition function One is partition by HASH. The other is partition by KEY.

Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. MySQL Cluster uses MD5() for this purpose; for tables using other storage engines, the server employs its own internal hashing function which is based on the same algorithm as PASSWORD(). The syntax rules for CREATE TABLE ... PARTITION BY KEY are similar to those for creating a table that is partitioned by hash.

The major differences are listed here:

•KEY is used rather than HASH.

•KEY takes only a list of one or more column names. Beginning with MySQL 5.1.5, the column or columns used as the partitioning key must comprise part or all of the table's primary key, if the table has one.

CREATE TABLE k1 (
    id INT NOT NULL PRIMARY KEY,
    name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;

If there is no primary key but there is a unique key, then the unique key is used for the partitioning key:

CREATE TABLE k1 (
    id INT NOT NULL,
    name VARCHAR(20),
    UNIQUE KEY (id)
)
PARTITION BY KEY()
PARTITIONS 2;

However, if the unique key column were not defined as NOT NULL, then the previous statement would fail.

But it don't tell how it partitions, will have to look into code.

Outras dicas

After I researched some wiki reference pages like Partition (database)

I believe my idea belongs to Composite partitioning .

Composite partitioning allows for certain combinations of the above partitioning schemes, by for example first applying a range partitioning and then a hash partitioning. Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed.

It also introduces some concepts like Consistent hashing, and Hash table

But some link like Partition (database) is kind of old. If some one can find more latest reference that will be better. My answer is incomplete indeed. Hope some one can answer it better!

UPDATE

Looks like Jonathan Ellis already mentioned in his blog, The Cassandra distributed database supports two partitioning schemes now: the traditional consistent hashing scheme, and an order-preserving partitioner. http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html

From Tom White's blog. A sample implemtation in java of consistent hashing

import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHash<T> {
    private final HashFunction hashFunction;
    private final int numberOfReplicas;
    private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>();

    public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
            Collection<T> nodes) {
        this.hashFunction = hashFunction;
        this.numberOfReplicas = numberOfReplicas;
        for (T node : nodes) {
            add(node);
        }
    }

    public void add(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.put(hashFunction.hash(node.toString() + i), node);
        }
    }

    public void remove(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove(hashFunction.hash(node.toString() + i));
        }
    }

    public T get(Object key) {
        if (circle.isEmpty()) {
            return null;
        }
        int hash = hashFunction.hash(key);
        if (!circle.containsKey(hash)) {
            SortedMap<Integer, T> tailMap = circle.tailMap(hash);
            hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        }
        return circle.get(hash);
    }
}

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow