Question

I'm using python's C API (2.7) in C++ to convert a python tree structure into a C++ tree. The code goes as follows:

  • the python tree is implemented recursively as a class with a list of children. the leaf nodes are just primitive integers (not class instances)

  • I load a module and invoke a python method from C++, using code from here, which returns an instance of the tree, python_tree, as a PyObject in C++.

  • recursively traverse the obtained PyObject. To obtain the list of children, I do this:

    PyObject* attr = PyString_FromString("children");
    PyObject* list = PyObject_GetAttr(python_tree,attr);
    for (int i=0; i<PyList_Size(list); i++) {
        PyObject* child = PyList_GetItem(list,i); 
        ...
    

Pretty straightforward, and it works, until I eventually hit a segmentation fault, at the call to PyObject_GetAttr (Objects/object.c:1193, but I can't see the API code). It seems to happen on the visit to the last leaf node of the tree.

I'm having a hard time determining the problem. Are there any special considerations for doing recursion with the C API? I'm not sure if I need to be using Py_INCREF/Py_DECREF, or using these functions or something. I don't fully understand how the API works to be honest. Any help is much appreciated!

EDIT: Some minimal code:

void VisitTree(PyObject* py_tree) throw (Python_exception)
{
    PyObject* attr = PyString_FromString("children");
    if (PyObject_HasAttr(py_tree, attr)) // segfault on last visit
    {
        PyObject* list = PyObject_GetAttr(py_tree,attr);
        if (list)
        {
            int size = PyList_Size(list);
            for (int i=0; i<size; i++)
            {
                PyObject* py_child = PyList_GetItem(list,i);
                PyObject *cls = PyString_FromString("ExpressionTree");
                // check if child is class instance or number (terminal)
                if (PyInt_Check(py_child) || PyLong_Check(py_child) || PyString_Check(py_child)) 
                    ;// terminal - do nothing for now
                else if (PyObject_IsInstance(py_child, cls))
                    VisitTree(py_child);
                else
                    throw Python_exception("unrecognized object from python");
            }
        }
    }
}
Was it helpful?

Solution

One can identify several problems with your Python/C code:

  • PyObject_IsInstance takes a class, not a string, as its second argument.

  • There is no code dedicated to reference counting. New references, such as those returned by PyObject_GetAttr are never released, and borrowed references obtained with PyList_GetItem are never acquired before use. Mixing C++ exceptions with otherwise pure Python/C aggravates the issue, making it even harder to implement correct reference counting.

  • Important error checks are missing. PyString_FromString can fail when there is insufficient memory; PyList_GetItem can fail if the list shrinks in the meantime; PyObject_GetAttr can fail in some circumstances even after PyObject_HasAttr succeeds.

Here is a rewritten (but untested) version of the code, featuring the following changes:

  • The utility function GetExpressionTreeClass obtains the ExpressionTree class from the module that defines it. (Fill in the correct module name for my_module.)

  • Guard is a RAII-style guard class that releases the Python object when leaving the scope. This small and simple class makes reference counting exception-safe, and its constructor handles NULL objects itself. boost::python defines layers of functionality in this style, and I recommend to take a look at it.

  • All Python_exception throws are now accompanied by setting the Python exception info. The catcher of Python_exception can therefore use PyErr_PrintExc or PyErr_Fetch to print the exception or otherwise find out what went wrong.

The code:

class Guard {
  PyObject *obj;
public:
  Guard(PyObject *obj_): obj(obj_) {
    if (!obj)
      throw Python_exception("NULL object");
  }
  ~Guard() {
    Py_DECREF(obj);
  }
};

PyObject *GetExpressionTreeClass()
{
  PyObject *module = PyImport_ImportModule("my_module");
  Guard module_guard(module);
  return PyObject_GetAttrString(module, "ExpressionTree");
}

void VisitTree(PyObject* py_tree) throw (Python_exception)
{
  PyObject *cls = GetExpressionTreeClass();
  Guard cls_guard(cls);

  PyObject* list = PyObject_GetAttrString(py_tree, "children");
  if (!list && PyErr_ExceptionMatches(PyExc_AttributeError)) {
    PyErr_Clear();  // hasattr does this exact check
    return;
  }
  Guard list_guard(list);

  Py_ssize_t size = PyList_Size(list);
  for (Py_ssize_t i = 0; i < size; i++) {
    PyObject* child = PyList_GetItem(list, i);
    Py_XINCREF(child);
    Guard child_guard(child);

    // check if child is class instance or number (terminal)
    if (PyInt_Check(child) || PyLong_Check(child) || PyString_Check(child)) 
      ; // terminal - do nothing for now
    else if (PyObject_IsInstance(child, cls))
      VisitTree(child);
    else {
      PyErr_Format(PyExc_TypeError, "unrecognized %s object", Py_TYPE(child)->tp_name);
      throw Python_exception("unrecognized object from python");
    }
  }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top