Question

I am new to pthreads in C and I am writing a simple program that find words in several files in parallel. However, whenever I input more than one file the output varies, suggesting that there is a race condition that I am not fixing in my code. Could you please help me fix it?

The following snippet is in main, making the pthreads.

    int i = 0;
char *word = "Pluto"; //Word to be found

Message messages[argc-1];
pthread_t threads[argc-1];
for(i; i < argc - 1; i++){
    messages[i].file = argv[i + 1];
    messages[i].word = word;
    messages[i].fp   = fopen(argv[i + 1], "r");
    int  iret = pthread_create( &threads[i], NULL, threadFindWord, (void*) &(messages[i]));
}for(i = 0; i < argc - 1; i++){
    pthread_join(threads[i],NULL);
}

The function that each thread calls:

Message *msg;
msg = (Message *) ptr;

int numFound = ffindWord(msg->fp, msg->word);

printf("File %s has %i occurences of the word %s\n", msg->file, numFound, msg->word);

fclose(msg->fp);
pthread_exit(NULL);

The following is the code for finding a word in a file)

int findWord(char * file, char * word){
 char * current = strtok(file, " ,.\n");
 int sum = 0;
 while (current != NULL){
    //printf("%s\n", current);
    if(strcmp(current, word) == 0)
        sum+=1;
    current = strtok(NULL, " ,.\n");
}
return sum;
}



int ffindWord(FILE *fp, char *word){

 fseek(fp, 0, SEEK_END);
 long pos = ftell(fp);
 fseek(fp, 0, SEEK_SET);
 char *bytes = malloc(pos);
 fread(bytes, pos, 1, fp);
 bytes[pos-1] = '\0';

 int sum = findWord(bytes, word);

 free(bytes);
 return sum;
 }

For clarification, the problem is that I get different results upon consecutive runs of the program. A call $programname file1 file2 Prints different results than a same call called right after. Note, however, that the program works when only one file is passed.

Any help is appreciated.

Was it helpful?

Solution

strtok keeps an internal pointer that is global... use strtok_r.

OTHER TIPS

This results in undefined behaviour as it goes beyond the end of the messages and threads arrays:

Message messages[argc-1];
pthread_t threads[argc-1];
for(i; i < argc; i++){

and may be the cause of the problems. It may work by chance when only 1 thread is executed.

Try changing to (or something similar):

int i;
Message messages[argc-1];
pthread_t threads[argc-1];
for(i = 1; i < argc; i++)
{
    messages[i - 1].file = argv[i];
    messages[i - 1].word = word;
    messages[i - 1].fp   = fopen(argv[i], "r");
    int iret = pthread_create(&threads[i - 1],
                               NULL,
                               threadFindWord,
                               (void*)&(messages[i - 1]));
}

for(i = 0; i < argc - 1; i++)
{
    pthread_join(threads[i],NULL);
} 

To avoid having the output from each thread mixed together in random ways, you need to buffer the output of each thread and then display them one at time.

For your case, the easiest way to do this would be to add a char *thread_output field (and thread_output_size field) to the Message structure, and then do something like this in your main thread:

for(i = 0; i < argc - 1; i++)
{
    pthread_join(threads[i],NULL);
    printf("%s", messages[i - 1].thread_output);
}

You'll probably also want to implement a function that makes sure the thread_output buffer is large enough and then uses vsnprintf() to add new text to the buffer, so you can use it just like you would've used printf().

For example:

void add_thread_output(int thread_number, const char *template, ...) {
    int old_length;
    int length;
    char *temp;

    va_start(ap, template);
    length = vsnprintf(NULL, 0, template, ap);
    va_end(ap);

    old_length = messages[thread_number - 1].thread_output_size;
    temp = realloc(messages[thread_number - 1].thread_output, old_length + length + 1);
    if(temp == NULL) {
        /* Set a flag or something */
    } else {
        va_start(ap, template);
        vsnprintf(&temp[old_length], length + 1, template, ap);
        va_end(ap);
        messages[thread_number - 1].thread_output_size += length;
        messages[thread_number - 1].thread_output = temp;
    }
}

Note: Any example code above is for the purpose of example only, and isn't tested or guaranteed to compile or work, and also isn't necessarily the most efficient way of doing it. E.g. allocating more space than you need (to avoid doing realloc() every time you add something to a thread's output buffer) may be a good idea.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top