Question

Say I have:

cur.execute("CREATE TABLE data_by_year ( device_id int, \
        site_id text, year_id int, event_time timestamp, value float, \
        PRIMARY KEY ((device_id, site_id, year_id),event_time))")

And I want to query all devices for years 2014 and 2013.

result=cur.execute("select distinct device_id, site_id, year_id,\
    from data_by_year where device_id IN (324535, 32453l),\
    and site_id in  and year_id IN (2014)")

Obvously this statement has many issues but it's the best example I could come up with. My beef is with the "where device_id IN (324535, 32453l)". In reality I will not know all the various devices so I want to grab them "ALL". How do I do this?

I'm dealing with time series minute data so I felt that one year was a reasonable partition.

Was it helpful?

Solution

knifewine's answer is correct, but if you're going to be executing this query frequently (and want good performance), I suggest using a second table:

CREATE TABLE all_device_data_by_year (
    site_id text,
    year_id int,
    device_id int,
    event_time timestamp,
    value float,
    PRIMARY KEY ((site_id, year_id), device_id, event_time)
)

You might want to partition by day/month instead of year, depending on the number of devices.

Regarding automatic query paging support in the python driver, it's available right now in the 2.0 branch. I should have a 2.0-beta release ready soon.

OTHER TIPS

You can grab everything using ALLOW FILTERING, but should be aware that this is costly in terms of performance because all nodes will need to answer back:

select distinct device_id, site_id, year_id from data_by_year ALLOW FILTERING;

The performance issue could be mitigated a bit by including a limit clause, but this won't allow you to page through all the data. If you want paging, you may want to use the datastax java driver with the paging feature (or wait for paging to land in the datastax python driver).

If none of the above will work for your use case, redesigning your table may be a better option (and possibly involving a secondary index but that can incur performance penalties as well).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top