I responded to this on the pycassa mailing list as well (please try not to post in multiple places), but I'll copy the answer for anybody else who sees this:
multiget is a very expensive operation for Cassandra. Each row in the multiget can require a couple of disk seeks for Cassandra. pycassa automatically splits the query up into smaller chunks, but this is still really expensive.
If you're trying to read the whole column family, use get_range()
instead.
If you're just trying to read a subset of the rows in that column family (based on some attribute) and you need to do this frequently, you need to use a different data model.
Since you're new to this, I would spend some time learning about data modeling in Cassandra: http://wiki.apache.org/cassandra/DataModel. (Note: most of these examples will use CQL3, which pycassa does not support. If you want to work with CQL3 instead, use the new DataStax python driver: https://github.com/datastax/python-driver)