Question

I'm using strtok() in c to parse a csv string. First I tokenize it to just find out how many tokens there are so I can allocate a string of the correct size. Then I go through using the same variable I used last time for tokenization. Every time I do it a second time though it strtok(NULL, ",") returns NULL even though there are still more tokens to parse. Can somebody tell me what I'm doing wrong?

char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
    count++;
    tok = strtok(NULL, ",");
}

//allocate array

tok = strtok(buffer, ",");
while(tok != NULL) {
    //do other stuff
    tok = strtok(NULL, ",");
}

So on that second while loop it always ends after the first token is found even though there are more tokens. Does anybody know what I'm doing wrong?

Was it helpful?

Solution

strtok() modifies the string it operates on, replacing delimiter characters with nulls. So if you want to use it more than once, you'll have to make a copy.

OTHER TIPS

There's not necessarily a need to make a copy - strtok() does modify the string it's tokenizing, but in most cases that simply means the string is already tokenized if you want to deal with the tokens again.

Here's your program modified a bit to process the tokens after your first pass:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    int i;
    char buffer[] = "some, string with  ,  tokens";

    char* tok;
    int count = 0;
    tok = strtok(buffer, ",");
    while(tok != NULL) {
        count++;
        tok = strtok(NULL, ",");
    }


    // walk through the tokenized buffer again
    tok = buffer;

    for (i = 0; i < count; ++i) {
        printf( "token %d: \"%s\"\n", i+1, tok);
        tok += strlen(tok) + 1;  // get the next token by skipping past the '\0'
        tok += strspn(tok, ","); //   then skipping any starting delimiters
    }

     return 0;
  }

Note that this is unfortunately trickier than I first posted - the call to strspn() needs to be performed after skipping the '\0' placed by strtok() since strtok() will skip any leading delimiter characters for the token it returns (without replacing the delimiter character in the source).

Use strsep - it actually updates your pointer. In your case you would have to keep calling NULL versus passing in the address of your string. The only issue with strsep is if it was previously allocated on the heap, keep a pointer to the beginning and then free it later.

char *strsep(char **string, char *delim);

char *string; char *token; token = strsep(&string, ",");

strtok is used in your normal intro to C course - use strsep, it's much better. :-) No getting confused on "oh shit - i have to pass in NULL still cuz strtok screwed up my positioning."

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top