time series data reading performance

https://stackoverflow.com/questions/18011261

04-06-2022
|

Question

How to store sensor time series data into cassandra?

Here i hava checked performance

In cassandra composite columnfamily single row key having 10000 timeseries data data like, query: select * from deviceidcomposite where did='Dev001' limit 5000

case 1:

Devid(row key)

   20120702105554 colname1=value
   20120702105554 colname2=value
   20120702105554 colname3=value
   20120702105554 colname4=value
   20120703105555 colname1=value
   20120703105555 colname2=value
   20120703105555 colname3=value
   20120703105555 colname4=value



    while we using cql3 to read single row key 5000 timeseries record it is taking nearly 3 min for 4 clumn

case 2:

Standard column family

    diviceidcolumname1(row key)

      20120703105552=value
      20120703105553=value
      20120703105554=value
      20120703105555=value
      ..
      ..
    diviceidcolumname2(row key)

      20120703105552=value
      20120703105553=value
      20120703105554=value
      20120703105555=value
      ..
      ..
    diviceidcolumname4(row key)

      20120703105552=value
      20120703105553=value
      20120703105554=value
      20120703105555=value
      ..
      ..
    diviceidcolumname4(row key)
      20120703105552=value
      20120703105553=value   
      20120703105554=value
      20120703105555=value
      ..
      ..

      (20120703105552->y/m/d/HH/MM/Sec)
  using thrift api reading data perticular column name value or whole column name value
     for one day(5000 timeseries data)
     one month 
     it is comparing with cql less amount if time it's taking 
     nearly it taking 2 min
     in this method reading single column name for one month is reading Very quick

which one is timeseries model?

Any other better way ! to increase my performance

Solution

I don not think your problem is not the data model (that I suggested in your previous question)..

Simple answer: do not use limit!

Limit a concerted effort to decide WHICH 5000 rows will be returned as a resultset. This will cause a serious performance drop.

If you need to limit the number of results use your WHERE clause (column slices). They can be evaluated by each node individually - the oppsosite of "limit"!

Also, I think I answered your previous question to this follow-up. It would only be fair if you mark the answer accordingly, if (and only if) you found it useful. Thanks.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow