Modi per implementare i dati delle versioni a Cassandra

https://stackoverflow.com/questions/4183945

10-10-2019
|

Domanda

Puoi condividere i tuoi pensieri come ti implementare i dati delle versioni a Cassandra.

Supponiamo che ho bisogno di record di versione in una semplice rubrica. (Record rubrica vengono memorizzate come righe in un ColumnFamily). Mi aspetto che la storia:

sarà usata raramente
sarà utilizzato in una sola volta per presentare in maniera "macchina del tempo"
non ci saranno più versioni di poche centinaia a un singolo record.
La storia non scadrà.

sto considerando il seguente approccio:

Convertire la rubrica Super Colonna Famiglia e memorizzare più versioni di indirizzo record di libri in una riga digitato (con data e ora) come colonne eccellenti.
Crea nuovo Super Colonna Famiglia per memorizzare vecchi dischi o modifiche ai record. Tale struttura apparirebbe come segue:

{ 'Chiave libro di fila indirizzo': { 'Tempo STAMP1': { 'Nome': 'nuovo nome', 'Modificato da': 'user id', },
```
'time stamp2': {
        'first name': 'new name',
        'modified by': 'user id',
    },
},
```
'altro indirizzo chiave libro di fila': { 'Timestamp': { ....
versioni deposito come serializzato (JSON) oggetto allegati in nuova ColumnFamilly. Rappresentare set di versione come righe e versioni come colonne. (Modellato su semplice documento delle versioni con CouchDB )

Soluzione

If you can add the assumption that address books typically have fewer than 10,000 entries in them, then using one row per address book time line in a super column family would be a decent approach.

A row would look like:

{'address_book_18f3a8':
  {1290635938721704: {'entry1': 'entry1_stuff', 'entry2': 'entry2_stuff'}},
  {1290636018401680: {'entry1': 'entry1_stuff_v2', ...},
  ...
}

where the row key identifies the address book, each super column name is a time stamp, and the subcolumns represent the address book's contents for that version.

This would allow you to read the latest version of an address book with only one query and also write a new version with a single insert.

The reason I suggest using this if address books are less than 10,000 elements is that super columns must be completely deserialized when you read even a single subcolumn. Overall, not that bad in this case, but it's something to keep in mind.

An alternative approach would be to use a single row per version of the address book, and use a separate CF with a time line row per address book like:

{'address_book_18f3a8': {1290635938721704: some_uuid1, 1290636018401680: some_uuid2...}}

Here, some_uuid1 and some_uuid2 correspond to the row key for those versions of the address book. The downside to this approach is that it requires two queries every time the address book is read. The upside is that it lets you efficiently read only select parts of an address book.

Altri suggerimenti

HBase(http://hbase.apache.org/) has this functionality built in. Give it a try.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow