This is my use case:

  • 10,000 IoT devices
  • Each IoT device sends its environmental data (temperature, pressure, etc.) every 30 seconds
  • Data of each device will be kept for 24 hours

The main requirement is to retrieve the latest data of a specific device (from device_id). Retrieve of the data will be step-wise. Meaning, at one time, only 100 records will be read. Then if the user wants, he will request the read the previous 100 records, and so on.

Also, it is possible that multiple users try to retrieve data of multiple devices at the same time.

My concerns are:

  • How to structure the DB (I am specifically looking at AWS's DynamoDB)
  • Creating a table for each device would be elegant, however seems creating 10,000 tables is not recommended
  • If I put all the 10,000 devices in one huge table, reads from this would be very inefficient.

Could someone advice on what would be the best design strategy for this case?

Thank you.

有帮助吗?

解决方案

Why NoSQL?

Any RDBMS on modern hardware can easily support your requirements.

Basic Setup

This is meant to be RDBMS agnostic. You'll want to tweak it to the RDBMS you choose.

create table iot_devices (
  device_id      int identity primary key,
  gis_location   rdbms dependent, -- yes, databases can support longitude/latitude information.
  serial#        varchar(20),
  other_data     what-ever-you-need
);

create table iot_data (
  device_id   int not null references iot_devices (device_id),
  date_time   date  not null,
  temp        number(4,1), -- make sure COMMENT if the units are Celsius, Fahrenheit, or Kelvin
  pH          number(3,1) check ( pH between -14.0 and 14.0)
  -- other data
);

Retrieval

Now, your primary SELECT statement is this:

select *
from iot_data a
where a.device_id = ?
order by a.device_id, a.date_time desc
-- syntax for limit/paging clause is RDBMS specific
;

The question mark is a place holder so that you can BIND a value. It also prevent SQL Injections. Need multiple devices? Look into Array Binding.

From experience, this index should help speed up that specific query:

create index iot_data_ix1 on iot_data (device_id, date_time desc);

Removing Expired Data

Removing "Old data" will cause headaches due to the amount of data that needs to be removed. With PARTITIONING, you can DROP large segments of data very quickly.

I suggest a range partition on the interval of 1 day and drop the 3rd day's partition every day. Some RDBMS will need the PARTITION created ahead of time; others can create the PARTITION on-the-fly.

Final

After you have settled on a setup, there are three thing you must do:

  • Benchmark
  • Benchmark
  • Benchmark

With out it, you won't know what the system's limit really is and you won't know if a change to the system has improved/deteriorated the design.

许可以下: CC-BY-SA归因
不隶属于 dba.stackexchange
scroll top