Question

I wish to have a table something as follows:

CREATE TABLE ProductFamilies (
  ID varchar,
  PriceLow int,
  PriceHigh int,
  MassLow int,
  MassHigh int,
  MnfGeo int,
  MnfID bigint,
  Data varchar,
  PRIMARY KEY (ID)
);

There are 13 fields in total. Most of these represent buckets. Data is a JSON of product family IDs, which are then used in a subsequent query. Given how Cassandra works, the column names under the hood will be the values. I wish to filter these.

I wish to run queries as follows:

SELECT Data FROM MyApp.ProductFamilies WHERE ID IN (?, ?, ?) AND PriceLow >= ? 
AND PriceHigh <= ? AND MassLow >= ? AND MassHigh <= ? and MnfGeo >= ? AND 
MnfGeo <= ?
  1. I read that Cassandra can only execute WHERE predicates against composite row keys or indexed columns. Is this still true? If so, I would have to make the columns < Data part of the PK.
  2. Is it still the case that one has to include all columns from left to right and cannot skip any?
  3. Are there any non-optimum points in my design?
  4. I would like to add a column "Materials", which is an array of possible materials in a product family. Think pizza toppings, and querying "WHERE Materials IN ('Pineapple')". Without creating a separate inverted index of materials and then performing a manual intersection against the above query, is there any other [more elegant] way of handling this in Cassandra?
Was it helpful?

Solution

Basically to support you queries you need to have

create column family ProductFamilies with 
comparator='CompositeType(UTF8Type, Int32Type, Int32Type, Int32Type, Int32Type, Int32Type, LongType, UTF8Type)' 
and key_validation_class='UTF8Type'

or

CREATE TABLE ProductFamilies (
  ID varchar,
  PriceLow int,
  PriceHigh int,
  MassLow int,
  MassHigh int,
  MnfGeo int,
  MnfID bigint,
  Data varchar,
  PRIMARY KEY (ID, PriceLow, PriceHigh, MassLow, MnfGeo, MnfID, Data)
);

Now you can query

SELECT Data FROM MyApp.ProductFamilies WHERE ID IN (?, ?, ?) AND PriceLow >= ? 
AND PriceHigh <= ? AND MassLow >= ? AND MassHigh <= ? and MnfGeo >= ? AND 
MnfGeo <= ?

Provided you don't miss any column from left to right [although not a filter but atleast a *] and all your values are in the column names rather the value.

One more thing you should understand about composite columns is "Column Slice must be contiguous" So, pricelow > =10 and pricelow <= 40 will return you a contiguous slice but filtering the result set with masslow and other columns will not work as it is not going to result in a contiguous slice. BTW pricelow = 10 and masslow <= 20 and masslow >=10 should work [tested with phpcassa] as it will result in a contiguous slice once again.

Else create a or multiple secondary index on any of the column of yours. Then you have the rights to query based on column values provided you always have atleast one of the indexed field in query. http://www.datastax.com/docs/1.1/ddl/indexes

Regarding you material question there is no other go than having an inverted index if it is going to be a multivalued column as of I know.

It would be great if @jbellis verifies this

OTHER TIPS

If you specify the exact PK you are looking up, as you propose here (id IN ...), you can use whatever expressions you like in the remaining predicates. There are no restrictions.

List collections are supported starting in 1.2.0, which is scheduled for release at the end of October. Indexed querying of collection contents may or may not be supported.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top