Question

New to clickhouse and stuck on the database creation structure for importing json data which is nested

Take for example the json data that looks like the following

when there is data populated

"FirewallMatchesActions": [
    "allow"
  ],
  "FirewallMatchesRuleIDs": [
    "1234abc"
  ],
  "FirewallMatchesSources": [
    "firewallRules"
  ],

or

"FirewallMatchesActions": [
    "allow",
    "block"
  ],
  "FirewallMatchesRuleIDs": [
    "1234abc",
    "1235abb"
  ],
  "FirewallMatchesSources": [
    "firewallRules"
  ],

but there maybe json data which doesn't have them populated

  "FirewallMatchesActions": [],
  "FirewallMatchesRuleIDs": [],
  "FirewallMatchesSources": [],

what would the clickhouse create database structure look like ?

Was it helpful?

Solution

ClickHouse supports types of columns as Array as Nested.

It looks like for your case Array will be enough:

CREATE TABLE json_import (
  TimeStamp DateTime DEFAULT now(),

  /* other columns */

  FirewallMatchesActions Array(String),
  FirewallMatchesRuleIDs Array(String),
  FirewallMatchesSources Array(String)
) ENGINE = MergeTree()
ORDER BY (TimeStamp);

/* insert test data */

INSERT INTO json_import (FirewallMatchesActions, FirewallMatchesRuleIDs, FirewallMatchesSources)
VALUES (['allow'], ['1234abc', '1235abb'], ['firewallRules']), 
       (['allow', 'block'], ['1234abc'], ['firewallRules']), 
       ([], [], []);

/* select data */
SELECT *
FROM json_import

/* result
┌───────────TimeStamp─┬─FirewallMatchesActions─┬─FirewallMatchesRuleIDs─┬─FirewallMatchesSources─┐
│ 2020-06-12 06:06:17 │ ['allow']              │ ['1234abc','1235abb']  │ ['firewallRules']      │
│ 2020-06-12 06:06:17 │ ['allow','block']      │ ['1234abc']            │ ['firewallRules']      │
│ 2020-06-12 06:06:17 │ []                     │ []                     │ []                     │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘
*/

To reduce storage consumption consider using LowCardinality-type and data encoding.


Additional info:

Nested Data Structures in ClickHouse

Reducing Clickhouse Storage Cost with the LowCardinality Type – Lessons from an Instana Engineer

New Encodings to Improve ClickHouse Efficiency

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top