Why do branch instructions use the status register?

Question 1

If you wanted to "store the boolean result in a register", you'd need separate instructions for <, <=, >, >=, etc. You'd also need one set for signed comparison and another for unsigned comparison.

In the x86 instruction set, there is one instruction for all of these, cmp (which sets a bunch of flags), then you use the appropriate jxx or setxx instruction to look at the flags that interest you (e.g., for the conditional jumps, you use jb, jbe/jna, ja, jae/jnb for unsigned ("below"/"above"), and jl, jle/jng, jg, jge/jnl for signed ("less"/"greater")).

Another feature of this approach is that you can actually check the flags after you do a mutation operation, like carrying additions or borrowing subtractions. Example (assume esi points to a 128-bit number that's being added to another 128-bit quantity that edi points to):

add [edi], [esi]
adc [edi+4], [esi+4]
adc [edi+8], [esi+8]
adc [edi+12], [esi+12]
jc overflow

That's just 5 instructions (well, hypothetically; in reality, x86 instructions can't have both operands be addresses, one has to be a register, which means that loading that register will take more instructions). Using a "boolean result in a register" approach sounds like it'd be more complicated (but I guess not by much, if you're using an instruction set that allows a three-way addition).

Question 2

Chris' answer is right on. I would only add that, besides not wanting to tie up a large register just to hold the result of a comparison, flag bits fall naturally out of the arithmetic register operations, and a specially designated flag register is a good place for them.

Whenever an addition is performed between two registers, there could be overflow into the carry bit, and a subtraction is just a variation on addition.

The high-order bit of a register, if the number is in twos-complement, is the sign bit.

Also, after every addition/subtraction, some special hardware detects if the result is all zero, and that's another bit to go in the flags.

All the arithmetic comparisons boil down to combinations of these, so they can be easily used for conditional branching, long-integer math, etc.

My favorite example of base-level hardware is Harry Porters's Relay Computer. In there you can see how a flag register really helps to minimize hardware and simplify the instruction set.

Question 3

Well first of what do you think the difference is between a boolean answer and a flag? True/false vs true/false? same same. Higher level languages waste a whole bunch of bits in a variable so that basically one of those bits holds the boolean result. The reality is that only a very inefficient compiler actually generates and wastes all of those bits in registers. Quite often a single or series of compares and branch on conditionals are used and the gprs are not consumed in order to implement that high level language. Other times depending on the complexity of the boolean then a gpr is definitely used and the boolean alu operations and other gprs are consumed to compute that boolean result, then there is a final compare if zero and branch if zero or not to complete the task (why else would you compute a boolean if you werent going to compare it and do something based on the comparison? an optimizer would remove all of that code otherwise).

The typical approach is the four flags that trivially fall out of an alu operation, zero, negative, carry (a.k.a. unsigned overflow, a.k.a. borrow), and signed overflow. NZCV. Then a laundry list of branch on conditional instructions. You have the efficiency of the conditional being computed on any alu operation. Most of the alu operations do burn a register output even if you didnt care about that output. But often a compare instruction (subtract without saving the result) is sufficient for most conditionals and is present. Sometimes if you are lucky you get a test instruction (AND without saving the result). Most of the time you know the comparison going in and it is only one comparison one conditional branch. On occasion there is a time where you can set the flags once, then do two or more conditional branches in a row, not having to re-compute the condition flags, they are preserved through the failed branch. That is the exception not the rule. The flags being a freebie is probably the reason this is the popular approach.

It is quite reasonable to have a laundry list of compare instructions, set flag if a == b, set flag if a < b, set flag if a<=b and so on. Then you only have a branch if flag set and branch if flag clear instruction on the backend. There is one processor I know of that does it that way. You wouldnt want to waste a whole gpr to store that one flag bit but it might be reasonable to do so for various reasons, the one I am thinking of does not do that.

There is one I know of that the psr is a gpr, which means it really isnt a gpr because it is special, but it is used/accessed like a gpr. So your alu output drops those flags in the gpr, there is not a laundry list of conditional branches, instead I think there are two, branch if bit X in register Y is set or branch if bit X in register Y is not set. (it may be worse than that SKIP if bit x in y is set, or skip if bit x in y is not set) And you have to gang up one or more of those in a row for the more complicated branches (branch if equal or greater, etc).

There is one that I know of that does not have any flags, it basically has a compare and jump if equal and compare and jump if not equal. register based. You have to synthesize all the other conditions, signed or unsigned overflow, the n bit, etc. Burning both gprs and instruction cycles, very inefficient. I can see the beauty in it at the same time hate the pain involved in having to burn registers and so many cycles. I assume the goal was to avoid having to carry processor state flags from one instruction to another and have the pipeline deal with that (the trade off was more pipeline hazards due to the intermediate results of all the math involved to synthesize the alu flags).

Pretty much any processor you can do all of the boolean work using gprs and the alu operations, resulting in a register that is either zero or not, then you can do that final one or two instructions depending on the processor to jump if that register is zero. You dont have to use the laundry list of branch on compares if you dont want to.

The bottom line is it is a huge waste to store single bit results in gprs. I hope you can understand that is inefficient, so you argument keeps referring to using gprs is IMO flawed. Flags of some sort is are efficient simply because they dont use gprs. Be it no flag (compare and jump in one instruction) one flag, or multiple flags. The do it yourself vs lots of compares and few branches vs lots of branches and free compares all have their pros and cons. I think the four flag approach is the most popular because the flags are an alu freebie, and, because we have been habitually doing it that way for so long.