Should I initialize C structs via parameter, or by return value? [closed]

https://softwareengineering.stackexchange.com/questions/290232

09-10-2020
|

Question

The company I work at is initializing all of their data structures through an initialize function like so:

//the structure
typedef struct{
  int a,b,c;  
} Foo;

//the initialize function
InitializeFoo(Foo* const foo){
   foo->a = x; //derived here based on other data
   foo->b = y; //derived here based on other data
   foo->c = z; //derived here based on other data
}

//initializing the structure  
Foo foo;
InitializeFoo(&foo);

I've gotten some push back trying to initialize my structs like this:

//the structure
typedef struct{
  int a,b,c;  
} Foo;

//the initialize function
Foo ConstructFoo(int a, int b, int c){
   Foo foo;
   foo.a = a; //part of parameter input (inputs derived outside of function)
   foo.b = b; //part of parameter input (inputs derived outside of function)
   foo.c = c; //part of parameter input (inputs derived outside of function)
   return foo;
}

//initialize (or construct) the structure
Foo foo = ConstructFoo(x,y,z);

Is there an advantage to one over the other?
Which one should I do, and how would I justify it as a better practice?

Solution

In the 2nd approach you will never have a half-initialised Foo. Putting all the construction in one place seems a more sensible, and obvious place.

But... the 1st way isn't so bad, and is often used in many areas (there's even a discussion of the best way to dependency-inject, either property-injection like your 1st way, or constructor injection like the 2nd). Neither is wrong.

So if neither is wrong and the rest of the company uses approach #1, then you should be fitting in with the existing codebase and not trying to mess it up by introducing a new pattern. This is really the most important factor at play here, play nice with your new friends and don't try to be that special snowflake who does things differently.

OTHER TIPS

Both approaches bundle the initialization code into a single function call. So far, so good.

However, there are two issues with the second approach:

The second one does not actually construct the resulting object, it initializes another object on the stack, which is then copied over to the final object. This is why I would see the second approach as slightly inferior. The push-back you have received is likely due to this extraneous copy.

This is even worse when you derive a class Derived from Foo (structs are largely used for object orientation in C): With the second approach, the function ConstructDerived() would call ConstructFoo(), copy the resulting temporary Foo object over into the superclass slot of a Derived object; finish the initialization of the Derived object; only to have the resulting object copied over again on return. Add a third layer, and the entire thing becomes completely ridiculous.
With the second approach, the ConstructClass() functions do not have access to the address of the object under construction. This makes it impossible to link up objects during construction, as it is needed when an object needs to register itself with another object for a callback.

Finally, not all structs are fully fledged classes. Some structs effectively just bundle a bunch of variables together, without any internal restrictions to the values of these variables. typedef struct Point { int x, y; } Point; would be a good example of this. For these use of an initializer function seems overkill. In these cases, the compound literal syntax may be convenient (it's C99):

Point = { .x = 7, .y = 9 };

Point foo(...) {
    //other stuff

    return (Point){ .x = n, .y = n*n };
}

Depending upon the contents of the structure and the particular compiler being used, either approach could be faster. A typical pattern is that structures meeting certain criteria can get returned in registers; for functions returning other structure types the caller is required to allocate space for the temporary structure somewhere (typically on the stack) and pass its address as a "hidden" parameter; in cases where a function's return is stored directly to a local variable whose address is not held by any outside code, some compilers may be able to pass the address of that variable directly.

If a structure type satisfies a particular implementation's requirements to be returned in registers (e.g. being either no larger than one machine word, or filling precisely two machine words) having a function return the structure may be faster than passing the address of a structure, especially since exposing the address of a variable to outside code that might keep a copy of it could preclude some useful optimizations. If a type does not satisfy such requirements, the generated code for a function that returns a struct will be similar to that for a function that accepts a destination pointer; the calling code would likely be faster for the form that takes a pointer, but that form lay lose some optimization opportunities.

It's too bad C doesn't provide a means of saying that an function is forbidden from keeping a copy of a passed-in pointer (semantics similar to a C++ reference) since passing such a restricted pointer would be gain the direct performance advantages of passing a pointer to a pre-existing object, but at the same time avoid the semantic costs of requiring a compiler to consider a variable's address "exposed".

One argument in favor of the "output-parameter" style is that it allows the function to return an error code.

struct MyStruct {
    int x;
    char *y;
    // ...
};

int MyStruct_init(struct MyStruct *out) {
    // ...
    char *c = malloc(n);
    if (!c) {
        return -1;
    }
    out->y = c;
    return 0;  // Success!
}

Considering some set of related structs, if initialization can fail for any of them, it can be worth to have all of them use the out-parameter style for consistency’s sake.

I'm presuming that your focus is on initialization via output parameter vs. initialization via return, not the discrepancy in how construction arguments are supplied.

Note that the first approach could allow Foo to be opaque (although not with the way you currently use it), and that's usually desirable for long-term maintainability. You could consider, for example, a function that allocates an opaque Foo struct without initializing it. Or perhaps you need to re-initialize a Foo struct that was previously initialized with different values.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange