Question

I'm trying to get my head around Cassandra/Pycassa db design.

With Mongoengine, you can refer to another class using "ReferenceField", as follows:

from mongoengine import *

class User(Document):
    email = StringField(required=True)
    first_name = StringField(max_length=50)
    last_name = StringField(max_length=50)

class Post(Document):
    title = StringField(max_length=120, required=True)
    author = ReferenceField(User)

As far as I can tell from the documentation, the Pycassa equivalent is something like this, but I don't know how to create a reference from the Post class author field to the User class:

from pycassa.types import *
from pycassa.pool import ConnectionPool
from pycassa.columnfamilymap import ColumnFamilyMap
import uuid

class User(object):
    key = LexicalUUIDType()
    email = UTF8Type()
    first_name = UTF8Type()
    last_name = UTF8Type()

class Post(object):
    key = LexicalUUIDType()
    title = UTF8Type()
    author = ???

What is the preferred way to do something like this? Obviously I could just put the User key in the Post author field, but I'm hoping there's some better way where all this is handled behind the scenes, like with Mongoengine.

Was it helpful?

Solution

@jterrace is correct, you're probably going about this the wrong way. With Cassandra, you don't tend to be concerned as much with objects, how they relate, and how to normalize that. Instead, you have to ask yourself "What queries do I need to be able to answer efficiently?", and then pre-build the answers for those queries. This usually involves a mixture of denormalization and the "wide row" model. I highly suggest that you read some articles about data modeling for Cassandra online.

With that said, pycassa's ColumnFamilyMap is just a thin wrapper that can cut down on boilerplate, nothing more. It does not attempt to provide support for anything complicated because it doesn't know what kinds of queries you need to be able to answer. So, specifically, you could store the matching User's LexicalUUID in the author field, but pycassa will not automatically fetch that User object for you when you fetch the Post object.

OTHER TIPS

I think you're really misunderstanding the data model for Cassandra. You should read Cassandra Data Model before continuing.

pycassa has no notion of "objects" like you have defined above. There are only column families, row key types, and column types. There is no such thing as a reference from one column family to another in Cassandra.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top