Why won't certain C string library functions (i.e. strtok) accept a char * that hasn't been allocated with malloc?

StackOverflow https://stackoverflow.com/questions/23401833

  •  13-07-2023
  •  | 
  •  

Question

Recently I was working on a school project which involved writing an assembler in C, and I encountered a problem with passing a pointer to strtok. I got past the error in my code, but I want to understand why what I was doing didn't work.

Below is a simplified example of a function where the error was occurring.

void processText(FILE *f) { //takes a file opened for reading.
  char *token, *temp;
  int len;
  char buff[81]; //line buffer

  while (fgets(buff, 81, f) != NULL) { //read in each line one at a time
    len = strlen(buff);
    token = strtok(buff, "#"); //first assignment

    /*if there is a comment, strip and print it*/
    if (len != strlen(token)) {
      printf("comment: %s", strtok(NULL, "#"));
    }

    //len = strlen(token);
    //temp = malloc(len + 1);
    //strcpy(temp, token);
    //token = strtok(temp, " ");

    token = strtok(token, " "); //this segfaults...

    printf("first word: %s\n", token);
    //free(temp);
  }
}

Replacing the offending line with the commented code block above it (and freeing temp at the end) fixes the problem, but it requires me to make an extra copy of my data for apparently no reason, and it leaves me wondering why I can't use the existing data unless I refer to it a certain way.

As I understand it, the strtok function accepts parameters char * and const char * respectively. It seems to me, that after the first assignment token and buff should appear to strtok as char *'s pointing to the same location--that is, that the value strtok receives as its first parameter is going to be a pointer to a location in memory which holds the value of the first character of buff regardless of which one (buff or token) I pass.

I (loosely) understand that char[] and char * are fundamentally different because arrays are allocated statically and pointers dynamically, but I don't understand why this should make any difference in this case, especially considering the system has no qualms about getting the strlen or strcpying from token.

I would love to understand what is happening here. Thanks in advance for your time!

Was it helpful?

Solution 3

This code works without any errors:

#include <stdio.h>
#include <string.h>
int main() {
    char *token, *temp;
    int len;
    char buff[81] = "test me # a comment"; 

    len = strlen(buff);
    token = strtok(buff, "#"); 

    if (len != strlen(token)) {
        printf("comment: %s\n", strtok(NULL, "#"));
    }

    token = strtok(token, " "); 

    printf("first word: %s\n", token);
    return 0;
}

It is possible that you have some other problem before the given segment. I don't see any reason to use malloc really.

OTHER TIPS

strtok is kind of a strange function in that it mutates the input buffer to insert nuls, and maintains internal state.

You are generally better off avoiding strtok and using strchr instead. Calling strtok repeatedly on pointers into the same buffer is likely to crash.

Either way, as a general rule, functions don't reject malloc'd memory. It's a clue you have some corruption going on, whether you use malloc or not. Running your program under Valgrind or similar will make it more obvious where.

So, why specifically does it crash here?

If the line contains a # then the first call to strtok will replace the # with a nul. Either way it returns a pointer to the first non-# byte of the string.

Your second call to strtok prints the string between the first and second # which is probably not what you want.

One problem you may be having here is that strtok returns null on lines that contain no non-delimiter characters. For example an empty line or a line with only spaces will cause token to get set to null and then the printf will crash.


A better version (untested) would be something like this:

char *star, *token, *arg;
star = strchr(buff, '#');
if (star) *star = '\0';
token = strtok(buff, " \t");
if (!token)
    continue; // empty line
printf("operator: %s\n", token);
while ((arg = strtok(NULL, " \t")) != NULL) 
    printf("arg: %s\n", arg);

Important things here:

  • check for null ;)
  • we don't use strtok to do something more easily done with strchr
  • for each string, first strtok points to the buffer and gets the first word, then the rest use null

But as I said, I would strongly consider using avoiding it if the syntax is nontrivial.

To avoid segfaults, check that the value returned by strtok is not NULL before passing it to functions that expect a non-null pointer , such as printf and strlen.

Your code would work if every string contains a with a # after it; and cause undefined behaviour otherwise.

You can't use strtok on a string literal because strtok writes to the buffer, and string literals are not writable. However, char buff[81]; is fine.

BTW len != strlen(token) is always true. If this is not clear to you then re-read the documentation for strtok to see what it does.

So really, the more general answer to "why is my C program crashing" is not to post it to StackOverflow and ask, or to change it until the problem's hidden, but rather:

  1. Check whether functions can return null and if so whether you handle it correctly - strtok can and you don't.
  2. Run it under Valgrind or any of dozens of other memory checkers that will flag the error as soon as it occurs, rather than it perhaps corrupting memory and crashing later.
  3. Gradually reduce the code and input data until you have the smallest case that shows the problem.
  4. Carefully read the manual for the functions you're calling and check how they handle these inputs and what their pre/post conditions are.
  5. Read some of the 460 other questions tagged strtok because it's almost certainly been asked before.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top