Question

I am writing a file system for one of my classes. This function is killing my performance by a LARGE margin and I can't figure out why. I've been staring at this code way too long and I am probably missing something very obvious. Does someone see why this function should go so slowly?

int getFreeDataBlock(struct disk *d, unsigned int dataBlockNumber)
{
    if (d == NULL)
    {
        fprintf(stderr, "Invalid disk pointer to getFreeDataBlock()\n");
        errorCheck();
        return -1;
    }

    // Allocate a buffer
    char *buffer = (char *) malloc(d->blockSize * sizeof(char));
    if (buffer == NULL)
    {
        fprintf(stderr, "Out of memory.\n");
        errorCheck();
        return -1;
    }

    do {
        // Read a block from the disk
        diskread(d, buffer, dataBlockNumber);

        // Cast to appropriate struct
        struct listDataBlock *block = (struct listDataBlock *) buffer;

        unsigned int i;
        for (i = 0; i < DATABLOCK_FREE_SLOT_LENGTH; ++i)
        {
            // We are in the last datalisting block...and out of slots...break
            if (block->listOfFreeBlocks[i] == -2)
            {
                break;
            }

            if (block->listOfFreeBlocks[i] != -1)
            {
                int returnValue = block->listOfFreeBlocks[i];

                // MARK THIS AS USED NOW
                block->listOfFreeBlocks[i] = -1;
                diskwriteNoSync(d, buffer, dataBlockNumber);

                // No memory leaks
                free(buffer);
                return returnValue;
            }
        }

        // Ok, nothing in this data block, move to next
        dataBlockNumber = block->nextDataBlock;
    } while (dataBlockNumber != -1);

    // Nope, didn't find any...disk must be full
    free(buffer);
    fprintf(stderr, "DISK IS FULL\n");
    errorCheck();
    return -1;
}

As you can see from the gprof, the diskread() nor the diskwriteNoSync() are taking extensive amounts of time?

  %   cumulative   self              self     total          
 time   seconds   seconds    calls  ms/call  ms/call  name    
 99.45     12.25    12.25     2051     5.97     5.99  getFreeDataBlock
  0.24     12.28     0.03  2220903     0.00     0.00  diskread
  0.24     12.31     0.03                             threadFunc
  0.08     12.32     0.01     2048     0.00     6.00  writeHelper
  0.00     12.32     0.00     6154     0.00     0.00  diskwriteNoSync
  0.00     12.32     0.00     2053     0.00     0.00  validatePath

or am I not understanding the output properly?

Thanks for any help.

Was it helpful?

Solution

The fact that you've been staring at this code and puzzling over the gprof output puts you in good company, because gprof and the concepts that are taught with it only work with little academic-scale programs doing no I/O. Here's the method I use.

Some excerpts from a useful post that got deleted, giving some MYTHS about profiling:

  • that program counter sampling is useful.
    It is only useful if you have an unnecessary hotspot bottleneck such as a bubble sort of a big array of scalar values. As soon as you, for example, change it into a sort using string-compare, it is still a bottleneck, but program counter sampling will not see it because now the hotspot is in string-compare. On the other hand if it were to sample the extended program counter (the call stack), the point at which the string-compare is called, the sort loop, is clearly displayed. In fact, gprof was an attempt to remedy the limitations of pc-only sampling.

  • that samples need not be taken when blocked
    The reasons for this myth are twofold: 1) that PC sampling is meaningless when the program is waiting, and 2) the preoccupation with accuracy of timing. However, for (1) the program may very well be waiting for something that it asked for, such as file I/O, which you need to know, and which stack samples reveal. (Obviously you want to exclude samples while waiting for user input.) For (2) if the program is waiting simply because of competition with other processes, that presumably happens in a fairly random way while it's running. So while the program may be taking longer, that will not have a large effect on the statistic that matters, the percentage of time that statements are on the stack.

  • that counting of statement or function invocations is useful.
    Suppose you know a function has been called 1000 times. Can you tell from that what fraction of time it costs? You also need to know how long it takes to run, on average, multiply it by the count, and divide by the total time. The average invocation time could vary from nanoseconds to seconds, so the count alone doesn't tell much. If there are stack samples, the cost of a routine or of any statement is just the fraction of samples it is on. That fraction of time is what could in principle be saved overall if the routine or statement could be made to take no time, so that is what has the most direct relationship to performance.

There are more where those came from.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top