How to sum the values of a json column filtered with regex?

https://dba.stackexchange.com/questions/274273

06-03-2021
|

Question

Columns:

A.name: Varchar with name product

B.products: JSON with time active By product

Query:

select
    A.name as product,
    sum(((REGEXP_MATCH(B.products->>'status', 'Active:.(.*?)\"'))[1])::float) as metric
from
    tbl_accounts A
inner join tbl_products B on
    A.identifier = B.identifier
where
    B.products->>'status' like '%Active%'
group by
    A.name,
    B.products

Explanation data

JSON 1: {"times": ["Stoped: 49.05", "Active: 23.26"]}

JSON 2: {"times": ["Stoped: 59.05", "Active: 33.26"]}

Desired Output: 56.52

Output:

ERROR: could not identify an equality operator for type json

Solution

The immediate reason for the error message is that the data type json has no equality operator. See:

You have:

...
group by
    A.name,
    B.products  -- type json!?

You can do that, using jsonb instead of json, where an equality operator is defined. But do you really want to group by B.products? (Same JSON documents?) Maybe you meant to write B.products->>'status' (Same status?) Or just GROUP BY A.name?

Aside: there may also be a simpler way to extract numbers that with REGEXP_MATCH(). You would have to define possible values B.products->>'status' and disclose the exact intention of the expression.

If you are at liberty to do so, it's typically best to store numbers in a separate key or even separate table column ....

`jsonpath` query in Postgres 12 or later

Your added sample values suggest you might be able to use jsonpath in Postgres 12 or later. Based on jsonb (not json).

Note: this is a proof of concept. If possible, normalize the table design and store numbers in a dedicated table column. Much simpler and more efficient.

Index

jsonpath operators can also be supported with a (default) jsonb_ops GIN index. I narrow down the scope with the expression products->'times':

CREATE INDEX products_times_gin_idx ON products USING gin ((products->'times'));

Index only helps for selective queries where not most rows have to be processed anyways!

Basic query to filter qualifying rows with `jsonpath`

Can use above index.

SELECT *
FROM   products B
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")';

jsonpath expression explained:

$[*] ... look at each array element of outer nesting level
? ... run the following test
(@ starts with "Active: ") ... Does element value start with 'Active:'?

... unnest and return only qualifying JSON array elements

SELECT *
FROM   accounts A
JOIN   products B USING (identifier)
     , jsonb_path_query(B.products->'times', '$[*] ? (@ starts with "Active: ")') act
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
;

... get results as text

SELECT *
FROM   accounts A
JOIN   products B USING (identifier)
     , jsonb_array_elements_text(jsonb_path_query_array(B.products->'times', '$[*] ? (@ starts with "Active: ")')) act
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
;

See:

How to turn JSON array into Postgres array?

... and aggregate the number part

Arriving at your final query:

SELECT A.name as product, sum(right(act::text, -8)::float)  -- -8 = length('Active: ')
FROM   accounts A
JOIN   products B USING (identifier)
     , jsonb_array_elements_text(jsonb_path_query_array(B.products->'times', '$[*] ? (@ starts with "Active: ")')) act
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
GROUP  BY 1;

db<>fiddle here

Update all values for given key nested in JSON array of objects

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange

How to sum the values ​of a json column filtered with regex?

jsonpath query in Postgres 12 or later

Index

Basic query to filter qualifying rows with jsonpath

... unnest and return only qualifying JSON array elements

... get results as text

... and aggregate the number part

How to sum the values of a json column filtered with regex?

`jsonpath` query in Postgres 12 or later

Basic query to filter qualifying rows with `jsonpath`