“Boolean” operations in Python (ie: the and/or operators)

https://stackoverflow.com/questions/3826473

26-09-2019
|

Question

This method searches for the first group of word characters (ie: [a-zA-Z0-9_]), returning the first matched group or None in case of failure.

def test(str):
    m = re.search(r'(\w+)', str)
    if m:
        return m.group(1)
    return None

The same function can be rewritten as:

def test2(str):
    m = re.search(r'(\w+)', str)
    return m and m.group(1)

This works the same, and is documented behavior; as this page clearly states:

The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.

However, being a boolean operator (it even says so on the manual), I expected and to return a boolean. As a result, I was astonished when I found out (how) this worked.

What are other use cases of this, and/or what is the rationale for this rather unintuitive implementation?

Solution

What are other use cases of this,

Conciseness (and therefore clarity, as soon as you get used to it, since after all it does not sacrifice readability at all!-) any time you need to check something and either use that something if it's true, or another value if that something is false (that's for and -- reverse it for or -- and I'm very deliberately avoiding the actual keywords-or-the-like True and False, since I'm talking about every object, not just bool!-).

Vertical space on any computer screen is limited, and, given the choice, it's best spent on useful readability aids (docstrings, comments, strategically placed empty lines to separate blocks, ...) than in turning, say, a line such as:

inverses = [x and 1.0/x for x in values]

into six such as:

inverses = []
for x in values:
    if x:
        inverses.append(1.0/x)
    else:
        inverses.append(x)

or more cramped versions thereof.

and/or what is the rationale for this rather unintuitive implementation?

Far from being "unintuitive", beginners regularly were tripped up by the fact that some languages (like standard Pascal) did not specify the order of evaluation and the short-circuiting nature of and and or; one of the differences between Turbo Pascal and the language standard, which back in the day made Turbo the most popular Pascal dialect of all times, was exactly that Turbo implemented and and or much like Python did later (and the C language did earlier...).

OTHER TIPS

What are other use cases of this,

No.

what is the rationale for this rather unintuitive implementation?

"unintuitive"? Really? I'd disagree.

Let's think.

"a and b" is falsified if a is false. So the first false value is sufficient to know the answer. Why bother transforming a to another boolean? It's already false. How much more false is False? Equally false, right?

So a's value -- when equivalent to False -- is false enough, so that's the value of the entire expression. No further conversion or processing. Done.

When a's value is equivalent to True then b's value is all that's required. No further conversion or processing. Why transform b to another boolean? It's value is all we need to know. If it's anything like True, then it's true enough. How much more true is True?

Why create spurious additional objects?

Same analysis for or.

Why Transform to Boolean? It's already true enough or false enough. How much more True can it get?

Try This.

>>> False and 0
False
>>> True and 0
0
>>> (True and 0) == False
True

While (True and 0) is actually 0, it's equal to False. That's false-enough for all practical purposes.

If it's a problem, then bool(a and b) will force the explicit conversion.

Basically a and b returns the operand that has the same truth value as the whole expression.

It might sound a bit confusing but just do it in your head: If a is False, then b does not matter anymore (because False and anything will always be False), so it can return a right away.

But when a is True then only b matters, so it returns b right away without even looking.

This is a very common and very basic optimization many languages do.

I didn't find this surprising, and in fact expected it to work when I originally tried it.

While not all values are bools, note that in effect, all values are boolean--they represent a truth value. (In Python, a bool is--in effect--a value which only represents true or false.) The number 0 isn't a bool, but it explicitly (in Python) has a boolean value of False.

In other words, the boolean operator and doesn't always return a bool, but it always return a boolean value; one that represents true or false, even if it also has other information logically attached to it (eg. a string).

Maybe this is retroactive justification; I'm not sure, but either way it seems natural for Python's boolean operators to behave as they do.

When to use it?

In your example, test2 feels clearer to me. I can tell what they both do equally: the construction in test2 doesn't make it any harder to understand. All else equal, the more concise code in test2 is--marginally--more quickly understood. That said, it's a trivial difference, and I don't prefer either enough that I'd jump to rewrite anything.

It can be similarly useful in other ways:

a = {
    "b": a and a.val,
    "c": b and b.val2,
    "d": c and c.val3,
}

This could be rewritten differently, but this is clear, straightforward and concise.

Don't go overboard; "a() and b() or c()" as a substitute for "a()? b():c()" is dangerous and confusing, since you'll end up with c() if b() is false. If you're writing a terniary statement, use the terniary syntax, even though it's hideously ugly: b() if a() else c().

I think that while this notation 'works' it represents a poor coding style that hides logic and will confuse more experienced programmers who will have the 'baggage' of knowledge how the majority of other languages work.

In most languages the return value of an active function is determined by the type of function. Unless its's been explicitly overloaded. Example a 'strlen' type function is expected to return an integer not a string.

In line functions such as the core arthritic and logic functions (+-/*|&!) are even more restrained because they also have history of formal math theory behind them. (Think about all the arguments about order of operations for these functions)

To have fundamental functions return anything but their most common data type (either logic or numeric) should be classified as purposeful obfuscation.

In just about every common language '&' or '&&' or 'AND' is a logic or Boolean function. Behind the scenes, optimization compilers might use short cutting logic like above in LOGIC FLOW but not DATA STRUCTURE Modification (any optimizing compiler that changed the value this way would have been considered broken), but if the value is expected to be used in a variable for further processing, it should be in the logic or boolean type because that's the 'formal' for these operators in the majority of circumstances.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow