Suitable NoSQL DB design for a fleet of IoT devices [closed]
-
13-01-2021 - |
题
This is my use case:
- 10,000 IoT devices
- Each IoT device sends its environmental data (temperature, pressure, etc.) every 30 seconds
- Data of each device will be kept for 24 hours
The main requirement is to retrieve the latest data of a specific device (from device_id). Retrieve of the data will be step-wise. Meaning, at one time, only 100 records will be read. Then if the user wants, he will request the read the previous 100 records, and so on.
Also, it is possible that multiple users try to retrieve data of multiple devices at the same time.
My concerns are:
- How to structure the DB (I am specifically looking at AWS's DynamoDB)
- Creating a table for each device would be elegant, however seems creating 10,000 tables is not recommended
- If I put all the 10,000 devices in one huge table, reads from this would be very inefficient.
Could someone advice on what would be the best design strategy for this case?
Thank you.
解决方案
Why NoSQL?
Any RDBMS on modern hardware can easily support your requirements.
Basic Setup
This is meant to be RDBMS agnostic. You'll want to tweak it to the RDBMS you choose.
create table iot_devices (
device_id int identity primary key,
gis_location rdbms dependent, -- yes, databases can support longitude/latitude information.
serial# varchar(20),
other_data what-ever-you-need
);
create table iot_data (
device_id int not null references iot_devices (device_id),
date_time date not null,
temp number(4,1), -- make sure COMMENT if the units are Celsius, Fahrenheit, or Kelvin
pH number(3,1) check ( pH between -14.0 and 14.0)
-- other data
);
Retrieval
Now, your primary SELECT statement is this:
select *
from iot_data a
where a.device_id = ?
order by a.device_id, a.date_time desc
-- syntax for limit/paging clause is RDBMS specific
;
The question mark is a place holder so that you can BIND
a value. It also prevent SQL Injections.
Need multiple devices? Look into Array Binding
.
From experience, this index should help speed up that specific query:
create index iot_data_ix1 on iot_data (device_id, date_time desc);
Removing Expired Data
Removing "Old data" will cause headaches due to the amount of data that needs to be removed. With PARTITIONING
, you can DROP
large segments of data very quickly.
I suggest a range partition on the interval of 1 day and drop the 3rd day's partition every day.
Some RDBMS will need the PARTITION
created ahead of time; others can create the PARTITION
on-the-fly.
Final
After you have settled on a setup, there are three thing you must do:
- Benchmark
- Benchmark
- Benchmark
With out it, you won't know what the system's limit really is and you won't know if a change to the system has improved/deteriorated the design.