ctypes: Memory management when returning pointers in COM Server

https://stackoverflow.com/questions/20546303

01-09-2022
|

Question

I'm experienceing some strange issues since the transition from Win XP to Server 2008. I tried to fix these issues, however, I'm still not sure how memory management works through COM when returning pointers to structures.

Let's say I need to return something of type POINTER(MyStruct) in a function of a COM server written in Python. Within the function, I create the object:

struct = MyStruct()
struct.field = 4

then I return

return POINTER(MyStruct)(struct)

Do I have to keep a python reference to struct to avoid freeing the memory on the server before the marshalling takes place? If I actually do it, the COM client crashes. If I don't, sometimes data contained in these structs gets corrupted after reception at the client.

I guess I'm doing something wrong here but I couldn't figure out what by reading the ctypes and comtypes documentation.

EDIT1: I just found this post which seems to be related as there is the content of a structure being overwritten as well. The answer suggests what was expecting, too, namely that the memory is freed "accidentally". However, the answer does not explain how to solve this.

As I explained before, if I keep the reference like

self.struct = struct

the client crashes.

EDIT2: I'm positing the COM interface definition and the python method signature on request by eryksun. In my question, I've simplified the problem a bit to make it easier to get an overview. The actual method returns a pointer to an array of structs:

IOPCItemMgt::ValidateItems
HRESULT ValidateItems(
[in] DWORD dwCount,
[in, size_is(dwCount)] OPCITEMDEF * pItemArray,
[in] BOOL bBlobUpdate,
[out, size_is(,dwCount)] OPCITEMRESULT ** ppValidationResults,
[out, size_is(,dwCount)] HRESULT ** ppErrors
);

Regarding the pointer on pointer **, interface specification says:

You will note the syntax size_is(,dwCount) in the IDL used in combination with pointers to pointers. This indicates that the returned item is a pointer to an actual array of the indicated type, rather than a pointer to an array of pointers to items of the indicated type.

And this is the python method:

def ValidateItems(self, count, p_item_array, update_blob):

Assume that there is a ctypes struct called OpcDa.tagOPCITEMRESULT().

I create an array of these structs by calling

validation_results = (OpcDa.tagOPCITEMRESULT * count)()
errors = (HRESULT * count)()

and after setting the fields of all array elements, I return the pointers like this:

return POINTER(OpcDa.tagOPCITEMRESULT)(add_results), POINTER(HRESULT)(errors)

EDIT3: I want to sum up the comments to this post and what I've found out so far:

As eryksun suggested, a simplified return statement at least results in the same behavior and problems, but is more readable:

return add_results, errors

In the meantime, I did some experiments. I tried the low level implementation as eryksun suggested.

def ValidateItems(self, this, count, p_item_array, update_blob, p_validation_results, p_errors):
(...)
    p_validation_results[0].contents = (OpcDa.tagOPCITEMRESULT*count)()
    p_errors[0].contents = (HRESULT*count)()
    (...)
    for index (..)
        val_result = OpcDa.tagOPCITEMRESULT()
        p_validation_results[0][index] = val_result
        p_validation_results[0][index].hServer = server_item_handle

In the loop where I fill in the array elements, I've overwritten the contents with a new element, just because I'm was desperate. Interestingly, using this code I was able to see the memory corruption already on the server whereas the code before only reveals the corruption on the client-side.

When index=0, hServer gets assigned its value. When I check the value, it's fine.
When index=1, but before the assignment of [0][1].hServer, the value of [0][0].hServer is still fine.
When index=1, but after the assignment [0][1].hServer = val_result, the value of [0][0].hServer has been corrupted in the same way as mentioned before.
When index=2 and after the assignment [0][2].hServer = val_result, the value of [0][1].hServer is fine

This means hServer only of the first array element gets partially overwritten after the second element is assigned a new value.

I assume that the memory for val_result of the first loop gets freed and overwritten somehow although I tought that the assignment some_pointer[0] = new_value actually copies the contents as this post suggests.

But now, it get's even more strange. When I remember the val_result in a python list like e.g.

self.items.append(val_result)

the corruption on the server side is gone. But, I get the COMError on the client again.

The problem is, that this mysterious COMError is not caused by an (catchable) error in the server. Everything seems to work fine. So must be caused by the internals of COM mashalling.

Any suggestions how to proceed or to get some more insights on what happens inside COM?

Solution

Wow, I was almost about to resign when I got an answer from the Microsoft forum with a link to an older thread which pointed me in the right direction. Actually, the thread starter solved its problem by using a SAFEARRAY, but if I'm not wrong, you cannot simply return a SAFEARRAY when a pointer to an array was requested. You'd have to change the interface which I couldn't. At least it did not work for me.

However, there was one line in his code snippet that got me thinking:

*ppServerId = (long*) CoTaskMemAlloc((*pSize) * sizeof(long));

The explicit call to CoTaskMemAlloc seems to be the counterpart to the method CoTaskMemFree that actually crashes on the client side. So I was thinking, that when the freeing routine in a 3rd party C++ software, that is supposed to work correctly for a lot of customers, crashes, that probably the allocation is wrong or missing.

So I was searching the comtypes sources for calls to CoTaskMemAlloc but could find any except one test case.

So I got it working by explicitely acllocating all memory for anything that is returned from a COM method via a pointer and not by value. That includes strings (c_wchar_p), structs, arrays and strings inside of structs.

So I wrote these three helper methods (that could actually be simplyfied a bit):

def make_com_array(item_type, item_count):
    array_mem = windll.ole32.CoTaskMemAlloc(sizeof(item_type) * item_count)
    return cast(array_mem, POINTER(item_type))

def make_com_string(text, typ=c_wchar_p):
    text = unicode(text)
    size = (len(text) + 1) * sizeof(c_wchar)
    mem = windll.ole32.CoTaskMemAlloc(size)
    ptr = cast(mem, typ)
    memmove(mem, text, size)
    return ptr

def make_com_object(object_type):
    size = sizeof(object_type)
    mem = windll.ole32.CoTaskMemAlloc(size)
    ptr = cast(mem, POINTER(object_type))
    return ptr

Then, I used it wherever I had to return any pointer.

So the ValidateItems method approximately looks like this now:

def ValidateItems(self, count, p_item_array, update_blob):

    validation_results = make_com_array(OpcDa.tagOPCITEMRESULT, count)
    errors = make_com_array(HRESULT, count)

    (...)
    for index (...):
        validation_results[index].hServer = server_item_handle

    return add_results, errors

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow