Question

(Disclaimer: I realize this is a massive wall of text, but I have done my best to boil things down to the essentials. If you are familiar with libtiff this isn't a very complicated question)

I asked this question on a libtiff mailing list already, but I thought that I may have a good chance here as well if there exists anyone who has worked with the library.

I am adding my own built-in tags to the library using the documentation here: http://libtiff.maptools.org/addingtags.html

So, I added an entry to the TIFFFieldInfo array defined at the top of tif_dirinfo.c like so:

{ TIFFTAG_CUSTOM_XXX, 4, 4, TIFF_SLONG, FIELD_XXX, 1, 0, "XXX" }, 

I then added a field to the TIFFDirectory structure defined in tif_dir.h:

typedef struct {
    /* ... */
    int32 td_xxx[4]; 
} TIFFDirectory;

Now I went ahead and modified _TIFFVSetField and _TIFFVGetField as instructed. This is where I hit a problem.

In mimicking the pattern already present in the library (see the implementation of TIFFTAG_YCBCRSUBSAMPLING, which is analogous to what I am doing), I added the following code to _TIFFVGetField:

/* existing, standard tag for reference */
case TIFFTAG_YCBCRSUBSAMPLING:
        *va_arg(ap, uint16*) = td->td_ycbcrsubsampling[0];
        *va_arg(ap, uint16*) = td->td_ycbcrsubsampling[1];
        break;
/* my new tag */
case TIFFTAG_CUSTOM_XXX: 
        *va_arg(ap, int32*) = td->td_xxx[0]; 
        *va_arg(ap, int32*) = td->td_xxx[1]; 
        *va_arg(ap, int32*) = td->td_xxx[2]; 
        *va_arg(ap, int32*) = td->td_xxx[3]; 
        break; 

As far as I can tell this is completely incorrect. The intention here is to populate the input in the variable argument list based off of an array of ints. The one saving grace is that the argument supplied in the va_list is always of the type int32 and the YcBr code uses two int16's. So, it works, but I can't copy that implementation.

_TIFFVGetField is eventually called from TIFFWriteNormalTag in tif_dirwrite.c. The relevant code is this:

case TIFF_LONG: 
    case TIFF_SLONG: 
    case TIFF_IFD: 
        if (fip->field_passcount) { 
            uint32* lp; 
            if (wc == (uint16) TIFF_VARIABLE2) { 
                TIFFGetField(tif, fip->field_tag, &wc2, &lp); 
                TDIRSetEntryCount(tif,dir, wc2); 
            } else {    /* Assume TIFF_VARIABLE */ 
                TIFFGetField(tif, fip->field_tag, &wc, &lp); 
                TDIRSetEntryCount(tif,dir, wc); 
            } 
            if (!TIFFWriteLongArray(tif, dir, lp)) 
                return 0; 
            } else { 
                if (wc == 1) { 
                    uint32 wp; 
                    /* XXX handle LONG->SHORT conversion */ 
                    TIFFGetField(tif, fip->field_tag, &wp); 
                    TDIRSetEntryOff(tif,dir, wp); 
                } else { 
                /* ---------------------------------------------------- */ 
                /* this is the code that is called in my scenario       */
                /* ---------------------------------------------------- */ 
                    uint32* lp; 
                    TIFFGetField(tif, fip->field_tag, &lp); 
                    if (!TIFFWriteLongArray(tif, dir, lp)) 
                        return 0; 
                } 
            } 
            break; 

So, an uninitialized pointer lp is declared and its address passed to TIFFGetField. This in turn sets up the va_list (with lp as the only argument) and calls TIFFVGetField, which calls _TIFFVGetField with the supplied va_list and the pointer to the uninitialized pointer.

There are two problems here.

First, this is how the library extracts the data (my code, but again, following the pattern already present)

*va_arg(ap, int32*) = td->td_xxx[0]; 

This seems incorrect. It is setting the original pointer to the value of an int. I reasoned that perhaps, in the example I was following (TIFFTAG_YCBCRSUBSAMPLING), these integers were actually addresses. So ok, but even if that's the case though there is another problem.

The library calls va_args N times, where N is the number of elements in the array. From what I see the variable argument list only ever contains a single argument (the address of a pointer). This is undefined behavior per the standard (important bit at the begining):

If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined.

The correct version would be

*va_arg(ap, int32**) = td_xxx; 

This sets the previously uninitialized pointer to the array, which is valid. I don't like that it points to the data itself instead of a copy, but whatever; at least it doesn't crash and gives me the correct result.

My concern here is that I am missing something subtle. This software is old and used by many, many people. As such, calling this a bug feels to me like blaming a crash on the compiler, which is almost always wrong.

However, I cannot reason out a way in which this is correct, specifically how the library writes to what va_arg returns when called more than one time.

Any help would be greatly appreciated. Thanks in advance.

Was it helpful?

Solution

So, the answer here ultimately ended up being that libtiff relies on UB for this. While it is technically UB, I couldn't find an implementation of va_arg that didn't do something like this:

( *(t *)((ap += _INTSIZEOF(t)) - _INTSIZEOF(t)) )

So, as long as t is less than the original argument size (as it is here), you can call va_arg more than once safely.

I ended up just setting the argument to the pointer which led to my data and it works. I don't like giving access directly to the header data itself, but without changing the library in a significant way it was my only option.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top