Question

How to sum the values ​​of a json column filtered with regex?

Columns:

A.name: Varchar with name product

B.products: JSON with time active By product

Query:

select
    A.name as product,
    sum(((REGEXP_MATCH(B.products->>'status', 'Active:.(.*?)\"'))[1])::float) as metric
from
    tbl_accounts A
inner join tbl_products B on
    A.identifier = B.identifier
where
    B.products->>'status' like '%Active%'
group by
    A.name,
    B.products

Explanation data

JSON 1: {"times": ["Stoped: 49.05", "Active: 23.26"]}

JSON 2: {"times": ["Stoped: 59.05", "Active: 33.26"]}

Desired Output: 56.52

Output:

ERROR: could not identify an equality operator for type json
Was it helpful?

Solution

The immediate reason for the error message is that the data type json has no equality operator. See:

You have:

...
group by
    A.name,
    B.products  -- type json!?

You can do that, using jsonb instead of json, where an equality operator is defined. But do you really want to group by B.products? (Same JSON documents?) Maybe you meant to write B.products->>'status' (Same status?) Or just GROUP BY A.name?

Aside: there may also be a simpler way to extract numbers that with REGEXP_MATCH(). You would have to define possible values B.products->>'status' and disclose the exact intention of the expression.

If you are at liberty to do so, it's typically best to store numbers in a separate key or even separate table column ....

jsonpath query in Postgres 12 or later

Your added sample values suggest you might be able to use jsonpath in Postgres 12 or later. Based on jsonb (not json).

Note: this is a proof of concept. If possible, normalize the table design and store numbers in a dedicated table column. Much simpler and more efficient.

Index

jsonpath operators can also be supported with a (default) jsonb_ops GIN index. I narrow down the scope with the expression products->'times':

CREATE INDEX products_times_gin_idx ON products USING gin ((products->'times'));

Index only helps for selective queries where not most rows have to be processed anyways!

Basic query to filter qualifying rows with jsonpath

Can use above index.

SELECT *
FROM   products B
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")';

jsonpath expression explained:

$[*] ... look at each array element of outer nesting level
? ... run the following test
(@ starts with "Active: ") ... Does element value start with 'Active:'?

... unnest and return only qualifying JSON array elements

SELECT *
FROM   accounts A
JOIN   products B USING (identifier)
     , jsonb_path_query(B.products->'times', '$[*] ? (@ starts with "Active: ")') act
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
;

... get results as text

SELECT *
FROM   accounts A
JOIN   products B USING (identifier)
     , jsonb_array_elements_text(jsonb_path_query_array(B.products->'times', '$[*] ? (@ starts with "Active: ")')) act
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
;

See:

... and aggregate the number part

Arriving at your final query:

SELECT A.name as product, sum(right(act::text, -8)::float)  -- -8 = length('Active: ')
FROM   accounts A
JOIN   products B USING (identifier)
     , jsonb_array_elements_text(jsonb_path_query_array(B.products->'times', '$[*] ? (@ starts with "Active: ")')) act
WHERE  B.products->'times' @? '$[*] ? (@ starts with "Active: ")' -- optional, to use idx
GROUP  BY 1;

db<>fiddle here

Related:

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top