Valid use of goto for error management in C?

https://stackoverflow.com/questions/788903

16-09-2019
|

Question

This question is actually a result of an interesting discussion at programming.reddit.com a while ago. It basically boils down to the following code:

int foo(int bar)
{
    int return_value = 0;
    if (!do_something( bar )) {
        goto error_1;
    }
    if (!init_stuff( bar )) {
        goto error_2;
    }
    if (!prepare_stuff( bar )) {
        goto error_3;
    }
    return_value = do_the_thing( bar );
error_3:
    cleanup_3();
error_2:
    cleanup_2();
error_1:
    cleanup_1();
    return return_value;
}

The usage of goto here appears to be the best way to go, resulting in the cleanest and most efficient code of all possibilities, or at least so it seems to me. Quoting Steve McConnell in Code Complete:

The goto is useful in a routine that allocates resources, performs operations on those resources, and then deallocates the resources. With a goto, you can clean up in one section of the code. The goto reduces the likelihood of your forgetting to deallocate the resources in each place you detect an error.

Another support for this approach comes from the Linux Device Drivers book, in this section.

What do you think? Is this case a valid use for goto in C? Would you prefer other methods, which produce more convoluted and/or less efficient code, but avoid goto?

Solution

FWIF, I find the error handling idiom you gave in the question's example to be more readable and easier to understand than any of the alternatives given in the answers so far. While goto is a bad idea in general, it can be useful for error handling when done in a simple and uniform manner. In this situation, even though it's a goto, it's being used in well-defined and more or less structured manner.

OTHER TIPS

As a general rule, avoiding goto is a good idea, but the abuses that were prevalent when Dijkstra first wrote 'GOTO Considered Harmful' don't even cross most people's minds as an option these days.

What you outline is a generalizable solution to the error handling problem - it is fine with me as long as it is carefully used.

Your particular example can be simplified as follows (step 1):

int foo(int bar)
{
    int return_value = 0;
    if (!do_something(bar)) {
        goto error_1;
    }
    if (!init_stuff(bar)) {
        goto error_2;
    }
    if (prepare_stuff(bar))
    {
        return_value = do_the_thing(bar);
        cleanup_3();
    }
error_2:
    cleanup_2();
error_1:
    cleanup_1();
    return return_value;
}

Continuing the process:

int foo(int bar)
{
    int return_value = 0;
    if (do_something(bar))
    {   
        if (init_stuff(bar))
        {
            if (prepare_stuff(bar))
            {
                return_value = do_the_thing(bar);
                cleanup_3();
            }
            cleanup_2();
        }
        cleanup_1();
    }
    return return_value;
}

This is, I believe, equivalent to the original code. This looks particularly clean since the original code was itself very clean and well organized. Often, the code fragments are not as tidy as that (though I'd accept an argument that they should be); for example, there is frequently more state to pass to the initialization (setup) routines than shown, and therefore more state to pass to the cleanup routines too.

I'm surprised nobody has suggested this alternative, so even though the question has been around a while I'll add it in: one good way of addressing this issue is to use variables to keep track of the current state. This is a technique that can be used whether or not goto is used for arriving at the cleanup code. Like any coding technique, it has pros and cons, and won't be suitable for every situation, but if you're choosing a style it's worth considering - especially if you want to avoid goto without ending up with deeply nested ifs.

The basic idea is that, for every cleanup action that might need to be taken, there is a variable from whose value we can tell whether the cleanup needs doing or not.

I'll show the goto version first, because it is closer to the code in the original question.

int foo(int bar)
{
    int return_value = 0;
    int something_done = 0;
    int stuff_inited = 0;
    int stuff_prepared = 0;


    /*
     * Prepare
     */
    if (do_something(bar)) {
        something_done = 1;
    } else {
        goto cleanup;
    }

    if (init_stuff(bar)) {
        stuff_inited = 1;
    } else {
        goto cleanup;
    }

    if (prepare_stuff(bar)) {
        stufF_prepared = 1;
    } else {
        goto cleanup;
    }

    /*
     * Do the thing
     */
    return_value = do_the_thing(bar);

    /*
     * Clean up
     */
cleanup:
    if (stuff_prepared) {
        unprepare_stuff();
    }

    if (stuff_inited) {
        uninit_stuff();
    }

    if (something_done) {
        undo_something();
    }

    return return_value;
}

One advantage of this over some of the other techniques is that, if the order of the initialisation functions is changed, the correct cleanup will still happen - for instance, using the switch method described in another answer, if the order of initialisation changes, then the switch has to be very carefully edited to avoid trying to clean up something wasn't actually initialised in the first place.

Now, some might argue that this method adds a whole lot of extra variables - and indeed in this case that's true - but in practice often an existing variable already tracks, or can be made to track, the required state. For example, if the prepare_stuff() is actually a call to malloc(), or to open(), then the variable holding the returned pointer or file descriptor can be used - for example:

int fd = -1;

....

fd = open(...);
if (fd == -1) {
    goto cleanup;
}

...

cleanup:

if (fd != -1) {
    close(fd);
}

Now, if we additionally track the error status with a variable, we can avoid goto entirely, and still clean up correctly, without having indentation that gets deeper and deeper the more initialisation we need:

int foo(int bar)
{
    int return_value = 0;
    int something_done = 0;
    int stuff_inited = 0;
    int stuff_prepared = 0;
    int oksofar = 1;


    /*
     * Prepare
     */
    if (oksofar) {  /* NB This "if" statement is optional (it always executes) but included for consistency */
        if (do_something(bar)) {
            something_done = 1;
        } else {
            oksofar = 0;
        }
    }

    if (oksofar) {
        if (init_stuff(bar)) {
            stuff_inited = 1;
        } else {
            oksofar = 0;
        }
    }

    if (oksofar) {
        if (prepare_stuff(bar)) {
            stuff_prepared = 1;
        } else {
            oksofar = 0;
        }
    }

    /*
     * Do the thing
     */
    if (oksofar) {
        return_value = do_the_thing(bar);
    }

    /*
     * Clean up
     */
    if (stuff_prepared) {
        unprepare_stuff();
    }

    if (stuff_inited) {
        uninit_stuff();
    }

    if (something_done) {
        undo_something();
    }

    return return_value;
}

Again, there are potential criticisms of this:

Don't all those "if"s hurt performance? No - because in the success case, you have to do all of the checks anyway (otherwise you're not checking all the error cases); and in the failure case most compilers will optimise the sequence of failing if (oksofar) checks to a single jump to the cleanup code (GCC certainly does) - and in any case, the error case is usually less critical for performance.
Isn't this adding yet another variable? In this case yes, but often the return_value variable can be used to play the role that oksofar is playing here. If you structure your functions to return errors in a consistent way, you can even avoid the second if in each case:
```
int return_value = 0;

if (!return_value) {
    return_value = do_something(bar);
}

if (!return_value) {
    return_value = init_stuff(bar);
}

if (!return_value) {
    return_value = prepare_stuff(bar);
}
```
One of the advantages of coding like that is that the consistency means that any place where the original programmer has forgotten to check the return value sticks out like a sore thumb, making it much easier to find (that one class of) bugs.

So - this is (yet) one more style that can be used to solve this problem. Used correctly it allows for very clean, consistent code - and like any technique, in the wrong hands it can end up producing code that is long-winded and confusing :-)

The problem with the goto keyword is mostly misunderstood. It is not plain-evil. You just need to be aware of the extra control paths that you create with every goto. It becomes difficult to reason about your code and hence its validity.

FWIW, if you look up developer.apple.com tutorials, they take the goto approach to error handling.

We do not use gotos. A higher importance is laid on return values. Exception handling is done via setjmp/longjmp -- whatever little you can.

There's nothing morally wrong about the goto statement any more than there is something morally wrong with (void)* pointers.

It's all in how you use the tool. In the (trivial) case you presented, a case statement can achieve the same logic, albeit with more overhead. The real question is, "what's my speed requirement?"

goto is just plain fast, especially if you're careful to make sure that it compiles to a short jump. Perfect for applications where speed is a premium. For other applications, it probably makes sense to take the overhead hit with if/else + case for maintainability.

Remember: goto doesn't kill applications, developers kill applications.

UPDATE: Here's the case example

int foo(int bar) { 
     int return_value = 0 ; 
     int failure_value = 0 ;

     if (!do_something(bar)) { 
          failure_value = 1; 
      } else if (!init_stuff(bar)) { 
          failure_value = 2; 
      } else if (prepare_stuff(bar)) { 
          return_value = do_the_thing(bar); 
          cleanup_3(); 
      } 

      switch (failure_value) { 
          case 2: cleanup_2(); 
          case 1: cleanup_1(); 
          default: break ; 
      } 
}

GOTO is useful. It's something your processor can do and this is why you should have access to it.

Sometimes you want to add a little something to your function and single goto let's you do that easily. It can save time..

In general, I would regard the fact that a piece of code could be most clearly written using goto as a symptom that the program flow is likely more complicated than is generally desirable. Combining other program structures in weird ways to avoid the use of goto would attempt to treat the symptom, rather than the disease. Your particular example might not be overly difficult to implement without goto:

  do {
    .. set up thing1 that will need cleanup only in case of early exit
    if (error) break;
    do
    {
      .. set up thing2 that will need cleanup in case of early exit
      if (error) break;
      // ***** SEE TEXT REGARDING THIS LINE
    } while(0);
    .. cleanup thing2;
  } while(0);
  .. cleanup thing1;

but if the cleanup was only supposed to happen when the function failed, the goto case could be handled by putting a return just before the first target label. The above code would require adding a return at the line marked with *****.

In the "cleanup even in normal case" scenario, I would regard the use of goto as being clearer than the do/while(0) constructs, among other things because the target labels themselves practically cry out "LOOK AT ME" far moreso than the break and do/while(0) constructs. For the "cleanup only if error" case, the return statement ends up having to be in just about the worst possible place from a readability standpoint (return statements should generally be either at the beginning of a function, or else at what "looks like" the end); having a return just before a target label meets that qualification much more readily than having one just before the end of a "loop".

BTW, one scenario where I sometimes use goto for error-handling is within a switch statement, when the code for multiple cases shares the same error code. Even though my compiler would often be smart enough to recognize that multiple cases end with the same code, I think it's clearer to say:

 REPARSE_PACKET:
  switch(packet[0])
  {
    case PKT_THIS_OPERATION:
      if (problem condition)
        goto PACKET_ERROR;
      ... handle THIS_OPERATION
      break;
    case PKT_THAT_OPERATION:
      if (problem condition)
        goto PACKET_ERROR;
      ... handle THAT_OPERATION
      break;
    ...
    case PKT_PROCESS_CONDITIONALLY
      if (packet_length < 9)
        goto PACKET_ERROR;
      if (packet_condition involving packet[4])
      {
        packet_length -= 5;
        memmove(packet, packet+5, packet_length);
        goto REPARSE_PACKET;
      }
      else
      {
        packet[0] = PKT_CONDITION_SKIPPED;
        packet[4] = packet_length;
        packet_length = 5;
        packet_status = READY_TO_SEND;
      }
      break;
    ...
    default:
    {
     PACKET_ERROR:
      packet_error_count++;
      packet_length = 4;
      packet[0] = PKT_ERROR;
      packet_status = READY_TO_SEND;
      break;
    }
  }

Although one could replace the goto statements with {handle_error(); break;}, and although one could use a do/while(0) loop along with continue to process the wrapped conditional-execute packet, I don't really think that's any clearer than using a goto. Further, while it might be possible to copy out the code from PACKET_ERROR everywhere the goto PACKET_ERROR is used, and while a compiler might write out the duplicated code once and replace most occurrences with a jump to that shared copy, the using the goto makes it easier to notice places which set the packet up a little differently (e.g. if the "execute conditionally" instruction decides not to execute).

I personally am a follower of the "The Power of Ten - 10 Rules for Writing Safety Critical Code".

I will include a small snippet from that text that illustrates what I believe to be a good idea about goto.

Rule: Restrict all code to very simple control flow constructs – do not use goto statements, setjmp or longjmp constructs, and direct or indirect recursion.

Rationale: Simpler control flow translates into stronger capabilities for verification and often results in improved code clarity. The banishment of recursion is perhaps the biggest surprise here. Without recursion, though, we are guaranteed to have an acyclic function call graph, which can be exploited by code analyzers, and can directly help to prove that all executions that should be bounded are in fact bounded. (Note that this rule does not require that all functions have a single point of return – although this often also simplifies control flow. There are enough cases, though, where an early error return is the simpler solution.)

Banishing the use of goto seems bad but:

If the rules seem Draconian at first, bear in mind that they are meant to make it possible to check code where very literally your life may depend on its correctness: code that is used to control the airplane that you fly on, the nuclear power plant a few miles from where you live, or the spacecraft that carries astronauts into orbit. The rules act like the seat-belt in your car: initially they are perhaps a little uncomfortable, but after a while their use becomes second-nature and not using them becomes unimaginable.

I agree that the goto cleanup in reverse order given in the question is the cleanest way of cleaning things up in most functions. But I also wanted to point out that sometimes, you want your function to clean up anyway. In these cases I use the following variant if if ( 0 ) { label: } idiom to go to the right point of the cleaning up process:

int decode ( char * path_in , char * path_out )
{
  FILE * in , * out ;
  code c ;
  int len ;
  int res = 0  ;
  if ( path_in == NULL )
    in = stdin ;
  else
    {
      if ( ( in = fopen ( path_in , "r" ) ) == NULL )
        goto error_open_file_in ;
    }
  if ( path_out == NULL )
    out = stdout ;
  else
    {
      if ( ( out = fopen ( path_out , "w" ) ) == NULL )
        goto error_open_file_out ;
    }

  if( read_code ( in , & c , & longueur ) )
    goto error_code_construction ;

  if ( decode_h ( in , c , out , longueur ) )
  goto error_decode ;

  if ( 0 ) { error_decode: res = 1 ;}
  free_code ( c ) ;
  if ( 0 ) { error_code_construction: res = 1 ; }
  if ( out != stdout ) fclose ( stdout ) ;
  if ( 0 ) { error_open_file_out: res = 1 ; }
  if ( in != stdin ) fclose ( in ) ;
  if ( 0 ) { error_open_file_in: res = 1 ; }
  return res ;
 }

Seems to me that cleanup_3 should do its cleanup, then call cleanup_2. Similarly, cleanup_2 should do it's cleanup, then call cleanup_1. It appears that anytime you do cleanup_[n], that cleanup_[n-1] is required, thus it should be the responsibility of the method (so that, for instance, cleanup_3 can never be called without calling cleanup_2 and possibly causing a leak.)

Given that approach, instead of gotos, you would simply call the cleanup routine, then return.

The goto approach isn't wrong or bad, though, it's just worth noting that it's not necessarily the "cleanest" approach (IMHO).

If you are looking for the optimal performance, then I suppose that the goto solution is best. I only expect it to be relevant, however, in a select few, performance critical, applications (e.g., device drivers, embedded devices, etc). Otherwise, it's a micro-optimization that has lower priority than code clarity.

I think that the question here is fallacious with respect to the given code.

Consider:

do_something(), init_stuff() and prepare_stuff() appear to know if they have failed, since they return either false or nil in that case.
Responsibility for setting up state appears to be the responsibility of those functions, since there is no state being set up directly in foo().

Therefore: do_something(), init_stuff() and prepare_stuff() should be doing their own cleanup. Having a separate cleanup_1() function that cleans up after do_something() breaks the philosophy of encapsulation. It's bad design.

If they did their own cleanup, then foo() becomes quite simple.

On the other hand. If foo() actually created its own state that needed to be torn down, then goto would be appropriate.

Here's what I've preferred:

bool do_something(void **ptr1, void **ptr2)
{
    if (!ptr1 || !ptr2) {
        err("Missing arguments");
        return false;
    }
    bool ret = false;

    //Pointers must be initialized as NULL
    void *some_pointer = NULL, *another_pointer = NULL;

    if (allocate_some_stuff(&some_pointer) != STUFF_OK) {
        err("allocate_some_stuff step1 failed, abort");
        goto out;
    }
    if (allocate_some_stuff(&another_pointer) != STUFF_OK) {
        err("allocate_some_stuff step 2 failed, abort");
        goto out;
    }

    void *some_temporary_malloc = malloc(1000);

    //Do something with the data here
    info("do_something OK");

    ret = true;

    // Assign outputs only on success so we don't end up with
    // dangling pointers
    *ptr1 = some_pointer;
    *ptr2 = another_pointer;
out:
    if (!ret) {
        //We are returning an error, clean up everything
        //deallocate_some_stuff is a NO-OP if pointer is NULL
        deallocate_some_stuff(some_pointer);
        deallocate_some_stuff(another_pointer);
    }
    //this needs to be freed every time
    free(some_temporary_malloc);
    return ret;
}

I prefer using technique described in the following example ...

struct lnode *insert(char *data, int len, struct lnode *list) {
    struct lnode *p, *q;
    uint8_t good;
    struct {
            uint8_t alloc_node : 1;
            uint8_t alloc_str : 1;
    } cleanup = { 0, 0 };

    // allocate node.
    p = (struct lnode *)malloc(sizeof(struct lnode));
    good = cleanup.alloc_node = (p != NULL);

    // good? then allocate str
    if (good) {
            p->str = (char *)malloc(sizeof(char)*len);
            good = cleanup.alloc_str = (p->str != NULL);
    }

    // good? copy data
    if(good) {
            memcpy ( p->str, data, len );
    }

    // still good? insert in list
    if(good) {
            if(NULL == list) {
                    p->next = NULL;
                    list = p;
            } else {
                    q = list;
                    while(q->next != NULL && good) {
                            // duplicate found--not good
                            good = (strcmp(q->str,p->str) != 0);
                            q = q->next;
                    }
                    if (good) {
                            p->next = q->next;
                            q->next = p;
                    }
            }
    }

    // not-good? cleanup.
    if(!good) {
            if(cleanup.alloc_str)   free(p->str);
            if(cleanup.alloc_node)  free(p);
    }

    // good? return list or else return NULL
    return (good? list: NULL);

}

source: http://blog.staila.com/?p=114

We use Daynix CSteps library as another solution for the "goto problem" in init functions.
See here and here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow