You are correct - when python is waiting on C code execution the GIL is released, and that is how you can get some speedup. But only one line of python can be executed at a time. Note that this is a CPython (implementation) detail, and not strictly speaking part of the language python itself. For example, Jython and IronPython have no GIL and can fully exploit multiprocessor systems.
If you need truly concurrent programming in CPython, you should be looking at multiprocessing rather than threading.