Well first of what do you think the difference is between a boolean answer and a flag? True/false vs true/false? same same. Higher level languages waste a whole bunch of bits in a variable so that basically one of those bits holds the boolean result. The reality is that only a very inefficient compiler actually generates and wastes all of those bits in registers. Quite often a single or series of compares and branch on conditionals are used and the gprs are not consumed in order to implement that high level language. Other times depending on the complexity of the boolean then a gpr is definitely used and the boolean alu operations and other gprs are consumed to compute that boolean result, then there is a final compare if zero and branch if zero or not to complete the task (why else would you compute a boolean if you werent going to compare it and do something based on the comparison? an optimizer would remove all of that code otherwise).
The typical approach is the four flags that trivially fall out of an alu operation, zero, negative, carry (a.k.a. unsigned overflow, a.k.a. borrow), and signed overflow. NZCV. Then a laundry list of branch on conditional instructions. You have the efficiency of the conditional being computed on any alu operation. Most of the alu operations do burn a register output even if you didnt care about that output. But often a compare instruction (subtract without saving the result) is sufficient for most conditionals and is present. Sometimes if you are lucky you get a test instruction (AND without saving the result). Most of the time you know the comparison going in and it is only one comparison one conditional branch. On occasion there is a time where you can set the flags once, then do two or more conditional branches in a row, not having to re-compute the condition flags, they are preserved through the failed branch. That is the exception not the rule. The flags being a freebie is probably the reason this is the popular approach.
It is quite reasonable to have a laundry list of compare instructions, set flag if a == b, set flag if a < b, set flag if a<=b and so on. Then you only have a branch if flag set and branch if flag clear instruction on the backend. There is one processor I know of that does it that way. You wouldnt want to waste a whole gpr to store that one flag bit but it might be reasonable to do so for various reasons, the one I am thinking of does not do that.
There is one I know of that the psr is a gpr, which means it really isnt a gpr because it is special, but it is used/accessed like a gpr. So your alu output drops those flags in the gpr, there is not a laundry list of conditional branches, instead I think there are two, branch if bit X in register Y is set or branch if bit X in register Y is not set. (it may be worse than that SKIP if bit x in y is set, or skip if bit x in y is not set) And you have to gang up one or more of those in a row for the more complicated branches (branch if equal or greater, etc).
There is one that I know of that does not have any flags, it basically has a compare and jump if equal and compare and jump if not equal. register based. You have to synthesize all the other conditions, signed or unsigned overflow, the n bit, etc. Burning both gprs and instruction cycles, very inefficient. I can see the beauty in it at the same time hate the pain involved in having to burn registers and so many cycles. I assume the goal was to avoid having to carry processor state flags from one instruction to another and have the pipeline deal with that (the trade off was more pipeline hazards due to the intermediate results of all the math involved to synthesize the alu flags).
Pretty much any processor you can do all of the boolean work using gprs and the alu operations, resulting in a register that is either zero or not, then you can do that final one or two instructions depending on the processor to jump if that register is zero. You dont have to use the laundry list of branch on compares if you dont want to.
The bottom line is it is a huge waste to store single bit results in gprs. I hope you can understand that is inefficient, so you argument keeps referring to using gprs is IMO flawed. Flags of some sort is are efficient simply because they dont use gprs. Be it no flag (compare and jump in one instruction) one flag, or multiple flags. The do it yourself vs lots of compares and few branches vs lots of branches and free compares all have their pros and cons. I think the four flag approach is the most popular because the flags are an alu freebie, and, because we have been habitually doing it that way for so long.