How to sum the values of a json column filtered with regex?
-
06-03-2021 - |
Question
How to sum the values of a json column filtered with regex?
Columns:
A.name
: Varchar with name product
B.products
: JSON with time active By product
Query:
select
A.name as product,
sum(((REGEXP_MATCH(B.products->>'status', 'Active:.(.*?)\"'))[1])::float) as metric
from
tbl_accounts A
inner join tbl_products B on
A.identifier = B.identifier
where
B.products->>'status' like '%Active%'
group by
A.name,
B.products
Explanation data
JSON 1: {"times": ["Stoped: 49.05", "Active: 23.26"]}
JSON 2: {"times": ["Stoped: 59.05", "Active: 33.26"]}
Desired Output: 56.52
Output:
ERROR: could not identify an equality operator for type json
Solution
The immediate reason for the error message is that the data type json
has no equality operator. See:
- How to query a json column for empty objects?
- How to remove known elements from a JSON[] array in PostgreSQL?
You have:
...
group by
A.name,
B.products -- type json!?
You can do that, using jsonb
instead of json
, where an equality operator is defined. But do you really want to group by B.products
? (Same JSON documents?) Maybe you meant to write B.products->>'status'
(Same status?) Or just GROUP BY A.name
?
Aside: there may also be a simpler way to extract numbers that with REGEXP_MATCH()
. You would have to define possible values B.products->>'status'
and disclose the exact intention of the expression.
If you are at liberty to do so, it's typically best to store numbers in a separate key or even separate table column ....
jsonpath
query in Postgres 12 or later
Your added sample values suggest you might be able to use jsonpath
in Postgres 12 or later. Based on jsonb
(not json
).
Note: this is a proof of concept. If possible, normalize the table design and store numbers in a dedicated table column. Much simpler and more efficient.
Index
jsonpath
operators can also be supported with a (default) jsonb_ops
GIN index. I narrow down the scope with the expression products->'times'
:
CREATE INDEX products_times_gin_idx ON products USING gin ((products->'times'));
Index only helps for selective queries where not most rows have to be processed anyways!
Basic query to filter qualifying rows with jsonpath
Can use above index.
SELECT *
FROM products B
WHERE B.products->'times' @? '$[*] ? (@ starts with "Active: ")';
jsonpath
expression explained:
$[*]
... look at each array element of outer nesting level
?
... run the following test
(@ starts with "Active: ")
... Does element value start with 'Active:'?
... unnest and return only qualifying JSON array elements
SELECT *
FROM accounts A
JOIN products B USING (identifier)
, jsonb_path_query(B.products->'times', '$[*] ? (@ starts with "Active: ")') act
WHERE B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
;
... get results as text
SELECT *
FROM accounts A
JOIN products B USING (identifier)
, jsonb_array_elements_text(jsonb_path_query_array(B.products->'times', '$[*] ? (@ starts with "Active: ")')) act
WHERE B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
;
See:
... and aggregate the number part
Arriving at your final query:
SELECT A.name as product, sum(right(act::text, -8)::float) -- -8 = length('Active: ')
FROM accounts A
JOIN products B USING (identifier)
, jsonb_array_elements_text(jsonb_path_query_array(B.products->'times', '$[*] ? (@ starts with "Active: ")')) act
WHERE B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
GROUP BY 1;
db<>fiddle here
Related: