Python C extension - maintaining state

Question 1

First, I'll answer the question you actually asked.

Create a struct State in C, just as you would if Python weren't involved.

If you're not going to be copying these around (you only pass them by struct State *), then you can just do (intptr_t)theStatePtr to get an id for Python. Of course you do need to be careful that the lifetime of the Python object never extends past the lifetime of the C object, but that's doable.

If you do need to copy/move the struct around for some reason, or you need more help managing state (e.g., treating the Python ids as weak references), pick the appropriate collection (hash table, tree, array, etc.) for your use case, then pass the key to Python as an id.

However, I think you may be optimizing the wrong part here. Passing the object back and forth is nothing—it's just a pointer copy. Refcounting may be an issue, but it rarely is, and the benefits you get from lifecycle management are usually worth it. The part that may kill performance is your C code continually converting a bunch of Python integers to C ints, etc. If this is your problem, just create a C struct with the C state, and wrap it in a Python object that doesn't expose any of the internals into Python.

Finally, do you actually need any optimization here at all? If you're doing CPU-intensive work, I'll bet the real work so completely overshadows the cost of the Python object access that the latter won't even show up in profiling. If you haven't profiled yet, that's absolutely positively the first thing you should do, because the right answer here may well be "don't bother doing anything".

Taking that a step further: If you're only writing the C code in C for optimization, are you sure you even need that? Dealing with memory management in C is annoying and error-prone, dealing with it in a C extension module for Python even more so, doing it for the first time when you don't already know how it works is almost a guaranteed recipe for spending all your time chasing down segfaults and leaks rather than writing your actual code. So, I would try the following in order, profiling each and only moving down the list if it's too slow:

Just write the algorithm in Python, and use your existing CPython interpreter.
Make sure you've got an optimal algorithm.
Try PyPy instead of CPython.
Get Cython and try compiling your Python code with as few changes as possible.
Modify your code to take advantage of Cython features like static types, direct calls to C functions, etc., as appropriate.
Write the lower-level code in C, the mid-level code (the stuff that tracks your state objects and presents a wrapper to Python) either in Cython, or in Python with ctypes.
Write the whole lower and mid level in C, using your favorite interface mechanism. Which is still probably not the native C API, unless you've got a lot of experience and are doing something pretty simple.

Question 2

Check out Cython for easy python-to-C bridging. The documentation there has plenty of examples -- I linked to a page that explains how you how to build a state object of some sort and explains the memory issues.

Here's an example of an AIO binding (github) written in cython/pyrex, an example of some fairly fancy I/O. In my experience we've rolled custom objects that marshal down to disk in compressed format using I/O routines such as this -- in memory the cython code takes care of dealing with what is visible to python (e.g. custom rolled socket object)

My best advice is for you to search around for .pyx examples and you'll find some stuff that should inspire a solution for you.

I'll also agree with the other posters: ask yourself if moving to C is a needed thing since extension types are going to add complexity to your whole system.