Question

This is more of an 'interesting' phenomena I encountered in a Python module that I'm trying to understand, rather than a request for help (though a solution would also be useful).

>>> import fuzzy
>>> s = fuzzy.Soundex(4)
>>> a = "apple"
>>> b = a
>>> sdx_a = s(a)
>>> sdx_a
'A140'
>>> a
'APPLE'
>>> b
'APPLE'

Yeah, so the fuzzy module totally violates the immutability of strings in Python. Is it able to do this because it is a C-extension? And does this constitute an error in CPython as well as the module, or even a security risk?

Also, can anyone think of a way to get around this behaviour? I would like to be able to keep the original capitalisation of the string.

Cheers,

Alex

Était-ce utile?

La solution

This bug was resolved back in February; update your version.

To answer your question, yes, there are several ways to modify immutable types at the C level. The security implications are unknown, and possibly even unknowable, at this point.

Autres conseils

I don't have the fuzzy module available to test right now, but the following creates a string with a new identity:

>>> a = "hello"
>>> b = ''.join(a)
>>> b
'hello'
>>> id(a), id(b)
(182894286096, 182894559280)

I don't know much about CPython, but it looks like in fuzzy.c it declares char *cs = s, where s is the input to __call__. It then mutates cs[i], which will obviously mutate s[i] and therefore the original string. This is definitely a bug with Fuzzy and you should file it on the bitbucket. As Greg's answer said, using ''.join(a) will create a new copy.

If it changes the immutable string, it's a bug, you can walk around this by:

s(a.upper())
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top