Question

Disclaimer: I am a Haskell programmer learning C. In Haskell, we have data declarations like

data No = NO

where NO does not have any interpretation as a number. If we had something equivalent in C, we could then do

union MaybeInt { enum No no; int just;};

Which could be used to do things like have an array which is initialized as No's.

int A[k];
for (int i = 0; i < k; i++)
    A[i] = NO;

This would be useful in doing memoization because often one has some recurisve algorithm which looks things up in an array and based on the value looked up, either makes a recursive call or not. For example: (for fibonacci numbers)

fibMem (int k){
    if (FIB[k] == NO)
      compute fibMem(k-1) + fib(k-2) and store the result in FIB[k]
    return FIB[k]
}

Now, of course, I could initialize FIB[i] to some absurd value like -100, and that works for this problem; however, given an arbitrary memoized routine, where I don't know the range of values, this kind of solution is not going to work.

The problem with using an enumeration type: The first thing I saw, that made me jump out of me seat saying "yes" was enum types. I thought why not do something like enum No {no}; Well, there is a problem with initializing the array used for memoization with nos. The problem is that no is defined to be 0 or some numeric constant of my choice if I like. This is unsatisfactory because, if the value stored in the array is supposed to be zero (or that constant of my choice), then when I do the check A[i] == no it may supposed to be that way! Thus, I will end up performing an un-needed recursion.

This brings us to question 1: How can I get a symbolic constant in C that is treated as a flag, that is incomparable to anything of a different type?

Now, the problem with unions. A union stores all its fields in one single address. Thus, for example, an update to maybeInt.just affects the value of maybeInt.no. For example,

union MaybeInt maybeInt;
maybeInt.just=9;
printf("%d",maybeInt.just);
printf("%d",maybeInt.no);

prints 99. What would be nice was if there was some sort of disjoint union type in C, so that if I used one of the values of the union, the other becomes unobtainable.

This brings us to the second and final question: how can one obtain a disjoint union type in C -- that is a type that has many possible variants, but only one at any given time. I would like something to be able to do something like:

disjoint T {type1 name1 , .... };

and have the if T.name2 is set, then a reference to T.name1 throws an error. Or even better any reference to T must go through some sort of case distinction.

If this cannot be done nicely, please explain why.

Was it helpful?

Solution

Discriminated unions are a very standard C idiom. You just have to keep the tag separate from the data:

struct Data
{
    enum DataType
    {
        NotSet,
        Integer,
        Infinity,
        Message
    } tag;
    union ValueType
    {
        int n;
        char const * msg;
    } data;
};

Now you just have to maintain tag discipline, i.e. only read from the value that's appropriate for the given tag, and update the tag after writing to a union member. For example:

void foo(struct Data const * x)
{
    switch (x->tag)
    {
    case NotSet:      // ...
    case Integer:     // use x->data.n
    case Infinity:    // ...
    case Message:     // use x->data.msg
    };

    x->data.msg = "Thank you!";
    x->tag = Message;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top