Pergunta

I'm working on a little set of scripts in python, and I came to this:

line = "a b c d e f g"
a, b, c, d, e, f, g = line.split()

I'm quite aware of the fact that these are decisions taken during implementation, but shouldn't (or does) python offer something like:

_, _, var_needed, _, _, another_var_needed, _ = line.split()

as well as Prolog does offer, in order to exclude the famous singleton variables.

I'm not sure, but wouldn't it avoid unnecessary allocation? Or creating references to the result of the split call does not count up as overhead?

EDIT:

Sorry, my point here is: in Prolog, as far as I'm concerned, in an expression like:

test(L, N) :-
    test(L, 0, N).
test([], N, N).
test([_|T], M, N) :-
    V is M + 1,
    test(T, V, N).

The variable represented by _ is not accessible, for what I suppose the reference to the value that does exist in the list [_|T] is not even created.

But, in Python, if I use _, I can use the last value assigned to _, and also, I do suppose the assignment occurs for each of the variables _ -- which may be considered an overhead.

My question here is if shouldn't there be (or if there is) a syntax to avoid such unnecessary attributions.

Foi útil?

Solução

_ is a perfectly valid variable name and yes, you can use a variable multiple times in an unpacking operation, so what you've written will work. _ will end up with the last value assigned in the line. Some Python programmers do use it this way.

_ is used for special purposes by some Python interactive shells, which may confuse some readers, and so some programmers do not use it for this reason.

There's no way to avoid the allocation with str.split(): it always splits the whole line, and the resulting strings are always allocated. It's just that, in this case, some of them don't live very long. But then again, who does?

You can avoid some allocations with, say, re.finditer():

import re

fi = re.finditer(r"\S+", line)
next(fi)
next(fi)
var_needed = next(fi).group()
next(fi)
next(fi)
another_var_needed = next(fi).group()
# we don't care about the last match so we don't ask for it

But next() returns a Match object and so it'll be allocated (and immediately discarded since we're not saving it anywhere). So you really only save the final allocation. If your strings are long, the fact that you're getting a Match object and not a string could save some memory and even time, I guess; I think the matched string is not sliced out of the source string until you ask for it. You could profile it to be sure.

You could even generalize the above into a function that returns only the desired tokens from a string:

import re

def get_tokens(text, *toknums):
    toknums = set(toknums)
    maxtok = max(toknums)
    for i, m in enumerate(re.finditer(r"\S", text)):
        if i in toknums:
            yield m.group()
        elif i > maxtok:
            break

var1, var2 = get_tokens("a b c d e f g", 2, 5)

But it still ain't exactly pretty.

Outras dicas

In fact, _ is a valid identifier in Python, and people often use it to accept a value that won't be needed again, so your code is great Python already.

There is no syntax in Python that exactly corresponds to the _ in Prolog. When you use _ in Python, it is a real variable that really holds a reference to a value.

That said, there are many, many objects allocated and deallocated when running a Python program. Even if you could control a few places like you've shown, it's a drop in the bucket, and wouldn't affect your program's resource usage enough to matter.

As far as syntax goes

_, _, var_needed, _, _, another_var_needed, _ = line.split()

is valid python. I would say it's even idiomatic, although a bit unusual.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top