After further investigation, I found the solution here:
https://issues.apache.org/jira/browse/CASSANDRA-5867
Basically, CqlStorage supports complex types. For that, the type should represented by a tuple in the tuples, carrying as first element the very data type as a string. For list, this is how one does this:
# python
@outputSchema("flat_bag:bag{}")
def flattenBag(bag):
return ('list',) + tuple([long(item) for tup in bag for item in tup])
Thus, in grunt:
# pig
CassandraAggregate = FOREACH GroupedRelation
GENERATE TOTUPLE(TOTUPLE('my_id', $0.my_id),
TOTUPLE('date', ISOToUnix($0.createdAt))),
TOTUPLE(COUNT($1), py_f.flattenBag($1.grouped_id));
DUMP CassandraAggregate;
(((my_id,30021),(date,1357084800000)),(2,(list, 60128490006325819,62726281032786005)))
(((my_id,31120),(date,1357084800000)),(1,(list, 60128411174143024)))
(((my_id,31120),(date,1357084800000)),(1,(list, 60128411146211875,63645100121476995,6012841114621187563645100121476995)))
This is then stored into cassandra using classic encoded prepared statement.
Hope this will of some help.