Python security: Danger of uncollected variables out of scope

Question 1

Any security system is only as strong as its weakest link.

It's difficult to tell what the weakest link is in your current system, since you haven't really given any details on the overall architecture, but if you're actually using Python code like you posted in the question (let's call this myscript.py)...

#!/usr/bin/python
encrypted_username = 'feh9876\xhu378\x&457(oy\x'
encrypted_password = 'dee\x\xhuie\xhjfirihy\x^\xhjfkekl'
# MySQLdb.connect(host, username, password, database)
db = MySQLdb.connect(self.mysql_host,
                     c-obj.stream_callabck(encrypted_username),
                     c-obj.stream_callback(encrypted_password),
                     self.mysql_database)

...then regardless of how or where you decrypt the password, any user can come along and run a script like this...

import MySQLdb

def my_connect(*args, **kwargs):
    print args, kwargs
    return MySQLdb.real_connect(*args, **kwargs)

MySQLdb.real_connect = MySQLdb.connect
MySQLdb.connect = my_connect
execfile('/path/to/myscript.py')

...which will print out the plaintext password, so implementing the decryption in C is like putting ten deadbolts on the front door, but leaving the window wide open.

If you want a good answer on how to secure your system, you'll have to provide some more information on the overall architecture, and what attack vectors you're trying to prevent.

If someone manages to hack root, you're pretty much screwed, but are better ways to conceal the password from non-root users.

However, if you're satisfied that the machine you're running this code on is secure (in the sense that it can't be accessed by any 'unauthorized' users), then none of this password obfuscation stuff is necessary - you may as well just put the cleartext password directly into the Python source code.

Update

Regarding architecture, I meant, how many separate servers are you running, what responsibilities do they have, and how are they meant to communicate with each other, and/or the outside world?

Assuming the primary goal is to prevent unauthorized access to the MySQL server, and assuming MySQL runs on a different server to the Python script, then why are you more concerned about someone gaining access to the server running the Python script, and getting the password for the MySQL server, rather than gaining access to the MySQL server directly?

If you're using a 'salt' as a decryption key for the encrypted MySQL password, then how does an authorized user pass that value to the system? Do they have to login to the server via, say, ssh, and run the script from the commandline, or it this something accessible via, say, a webserver?

Either way, if someone does compromise the system running the Python script, they merely have to wait until the next authorized user comes along, and 'sniff' the 'salt' they enter.

Question 2

Even if you call gc.collect and those strings are deallocated, they might still remain in memory. Also, strings are immutable, which means you have no (standard) way of overwriting them. Also note that if you have performed operations on those strings some copies of them might be lying around.

So don't use strings if possible.

You need to overwrite the memory (and even then, the memory might be dumped somewhere, like into a page file). Use a byte-array and overwrite the memory when you're done.

Question 3

If no other references to the value exist, the your gc.collect normally destroys the object.

However, something as simple as string interning or cacheing may keep an unexpected reference, leaving the value alive in memory. Python has a number of implementations (PyPy, Jython, PyPy) that do different things internally. The language itself makes very few guarantees about whether or when the value would actually get erased from memory.

In your example, you also use name mangling. Because the mangling is easily reproduced by hand, this doesn't add any security at all.

One further thought: It isn't clear what your security model is. If the attacker can call your decrypt function and run arbitrary code in the same process, what would prevent them from wrapping decrypt to keep a code of the inputs and outputs.