String immutability in CPython violated
Question
This is more of an 'interesting' phenomena I encountered in a Python module that I'm trying to understand, rather than a request for help (though a solution would also be useful).
>>> import fuzzy
>>> s = fuzzy.Soundex(4)
>>> a = "apple"
>>> b = a
>>> sdx_a = s(a)
>>> sdx_a
'A140'
>>> a
'APPLE'
>>> b
'APPLE'
Yeah, so the fuzzy module totally violates the immutability of strings in Python. Is it able to do this because it is a C-extension? And does this constitute an error in CPython as well as the module, or even a security risk?
Also, can anyone think of a way to get around this behaviour? I would like to be able to keep the original capitalisation of the string.
Cheers,
Alex
La solution
This bug was resolved back in February; update your version.
To answer your question, yes, there are several ways to modify immutable types at the C level. The security implications are unknown, and possibly even unknowable, at this point.
Autres conseils
I don't have the fuzzy
module available to test right now, but the following creates a string with a new identity:
>>> a = "hello"
>>> b = ''.join(a)
>>> b
'hello'
>>> id(a), id(b)
(182894286096, 182894559280)
I don't know much about CPython, but it looks like in fuzzy.c
it declares char *cs = s
, where s
is the input to __call__
. It then mutates cs[i]
, which will obviously mutate s[i]
and therefore the original string. This is definitely a bug with Fuzzy and you should file it on the bitbucket. As Greg's answer said, using ''.join(a)
will create a new copy.
If it changes the immutable string, it's a bug, you can walk around this by:
s(a.upper())