How can I perform an AND on an unknown number of booleans in postgresql?

https://stackoverflow.com/questions/683842

22-08-2019
|

Question

I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:

CREATE TABLE myTable
(
  someKey integer,
  someBool boolean
);

insert into myTable values (1, 't'),(1, 't'),(2, 'f'),(2, 't');

Each someKey could have 0 or more entries. For any given someKey, I need to know if a) all the entries are true, or b) any of the entries are false (basically an AND).

I've come up with the following function:

CREATE FUNCTION do_and(int4) RETURNS boolean AS
$func$
declare
    rec record;
    retVal boolean = 't'; -- necessary, or true is returned as null (it's weird)
begin
    if not exists (select someKey from myTable where someKey = $1) then
        return null; -- and because we had to initialise retVal, if no rows are     found true would be returned
    end if;

    for rec in select someBool from myTable where someKey = $1 loop
        retVal := rec.someBool AND retVal;
    end loop;

    return retVal;
end;
$func$ LANGUAGE 'plpgsql' VOLATILE;

... which gives the correct results:

select do_and(1) => t
select do_and(2) => f
select do_and(3) => null

I'm wondering if there's a nicer way to do this. It doesn't look too bad in this simple scenario, but once you include all the supporting code it gets lengthier than I'd like. I had a look at casting the someBool column to an array and using the ALL construct, but I couldn't get it working... any ideas?

Solution

No need to redefine functions PostgreSQL already provides: bool_and() will do the job:

select bool_and(someBool)
  from myTable
  where someKey = $1
  group by someKey;

(Sorry, can't test it now)

OTHER TIPS

Similar to the previous one, but in one query, this will do the trick, however, it is not clean nor easily-understandable code:

SELECT someKey, 
  CASE WHEN sum(CASE WHEN someBool THEN 1 ELSE 0 END) = count(*)
                    THEN true 
                    ELSE false END as boolResult
FROM  table
GROUP BY someKey

This will get all the responses at once, if you only want one key just add a WHERE clause

I just installed PostgreSQL for the first time this week, so you'll need to clean up the syntax, but the general idea here should work:

return_value = NULL

IF EXISTS
(
     SELECT
          *
     FROM
          My_Table
     WHERE
          some_key = $1
)
BEGIN
     IF EXISTS
     (
          SELECT
               *
          FROM
               My_Table
          WHERE
               some_key = $1 AND
               some_bool = 'f'
     )
          SELECT return_value = 'f'
     ELSE
          SELECT return_value = 't'
END

The idea is that you only need to look at one row to see if any exist and if at least one row exists you then only need to look until you find a false value to determine that the final value is false (or you get to the end and it's true). Assuming that you have an index on some_key, performance should be good I would think.

(Very minor side-point: I think your function should be declared STABLE rather than VOLATILE, since it just uses data from the database to determine its result.)

As someone mentioned, you can stop scanning as soon as you encounter a "false" value. If that's a common case, you can use a cursor to actually provoke a "fast finish":

CREATE FUNCTION do_and(key int) RETURNS boolean
  STABLE LANGUAGE 'plpgsql' AS $$
DECLARE
  v_selector CURSOR(cv_key int) FOR
    SELECT someBool FROM myTable WHERE someKey = cv_key;
  v_result boolean;
  v_next boolean;
BEGIN
  OPEN v_selector(key);
  LOOP
    FETCH v_selector INTO v_next;
    IF not FOUND THEN
      EXIT;
    END IF;
    IF v_next = false THEN
      v_result := false;
      EXIT;
    END IF;
    v_result := true;
  END LOOP;
  CLOSE v_selector;
  RETURN v_result;
END
$$;

This approach also means that you are only doing a single scan on myTable. Mind you, I suspect you need loads and loads of rows in order for the difference to be appreciable.

You can also use every, which is just an alias to bool_and:

select every(someBool)
from myTable
where someKey = $1
group by someKey;

Using every makes your query more readable. An example, show all persons who just eat apple every day:

select personId
from personDailyDiet
group by personId
having every(fruit = 'apple');

every is semantically the same as bool_and, but it's certainly clear that every is more readable than bool_and:

select personId
from personDailyDiet
group by personId
having bool_and(fruit = 'apple');

Maybe count 'all' items with somekey=somevalue and use it in a boolean comparison with the count of all 'True' occurences for somekey?

Some non-tested pseudo-sql to show what i mean...

select foo1.count_key_items = foo2.count_key_true_items
from
   (select count(someBool) as count_all_items from myTable where someKey = '1') as foo1,
   (select count(someBool) as count_key_true_items from myTable where someKey = '1' and someBool) as foo2

CREATE FUNCTION do_and(int4)
  RETURNS boolean AS
$BODY$
  SELECT
    MAX(bar)::bool
  FROM (
    SELECT
      someKey,
      MIN(someBool::int) AS bar
    FROM
      myTable
    WHERE
      someKey=$1
    GROUP BY
      someKey

    UNION

    SELECT
      $1,
      NULL
  ) AS foo;
$BODY$
  LANGUAGE 'sql' STABLE;

In case you don't need the NULL value (when there aren't any rows), simply use the query below:

SELECT
  someKey,
  MIN(someBool::int)::bool AS bar
FROM
  myTable
WHERE
  someKey=$1
GROUP BY
  someKey

SELECT  DISTINCT ON (someKey) someKey, someBool
FROM    myTable m
ORDER BY
        someKey, someBool NULLS FIRST

This will select the first ordered boolean value for each someKey.

If there is a single FALSE or a NULL, it will be returned first, meaning that the AND failed.

If the first boolean is a TRUE, then all other booleans are also TRUE for this key.

Unlike the aggregate, this will use the index on (someKey, someBool).

To return an OR, just reverse the ordering:

SELECT  DISTINCT ON (someKey) someKey, someBool
FROM    myTable m
ORDER BY
        someKey, someBool DESC NULLS FIRST

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow