سؤال

Consider the following snippet that uses strtok to split the string madddy.

char* str = (char*) malloc(sizeof("Madddy"));
strcpy(str,"Madddy");

char* tmp = strtok(str,"d");
std::cout<<tmp;

do
{
    std::cout<<tmp;
    tmp=strtok(NULL, "dddy");
}while(tmp!=NULL);

It works fine, the output is Ma. But by modifying the strtok to the following,

tmp=strtok(NULL, "ay");

The output becomes Madd. So how does strtok exactly work? I have this question because I expected strtok to take each and every character that is in the delimiter string to be taken as a delimiter. But in certain cases it is doing that way but in few cases, it is giving unexpected results. Could anyone help me understand this?

هل كانت مفيدة؟

المحلول

It seems you forget that you have call strtok the first time (out of loop) by delimiter "d".

The strtok is working fine. You should have a reference here.

For the second example(strtok("ay")):

First, you call strtok(str, "d"). It will look for the first "d", and seperate your string. Specifically, it sets tmp = "Ma", and str = "ddy" (dropping the first "d").

Then, you call strtok(str, "ay"). It will look for an "a" in str, but since your string now is only "ddy", no matching occurs. Then it will look for an "y". So str = "dd" and tmp = "".

It prints "Madd" as you saw.

نصائح أخرى

"Trying to understand strtok" Good luck!

Anyway, we're in 2011. Tokenise properly:

std::string str("abc:def");
char split_char = ':';
std::istringstream split(str);
std::vector<std::string> token;

for (std::string each; std::getline(split, each, split_char); token.push_back(each));

:D

Fred Flintstone probably used strtok(). It predates multi threaded environments and beats up (modifies) the source string.

When called with NULL for the first parameter, it continues parsing the last string. This feature was convenient, but a bit unusual even in its day.

Actually your code is wrong, no wonder you get unexpected results:

char* str = (char*) malloc(sizeof("Madddy"));

should be

char* str = (char*) malloc(strlen("Madddy") + 1);

I asked a question inspired from another question about functions causing security problems/bad practise functions and the c standard library.

To quote the answer given to me from there:

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

As you've tagged your question C++, use something else! If you want to use C, I'd suggest implementing your own tokenizer that works in a safe fashion.

Since you changed your tag to be C and not C++, I rewrote your function to use printf so that you can see what is happening. Hoang is correct. You seeing correct output, but I think that you are printing everything on the same line, so you got confused by the output. Look at Hoang's answer as he explains what is happening correctly. Also, as others have noted, strtok destroys the input string, so you have to be careful about that - and it's not thread safe. But if you need a quick an dirty tokenizer, it works. Also, I changed the code to correctly use strlen, and not sizeof as correctly pointed out by Anders.

Here is your code modified to be more C-like:

char* str = (char*) malloc(strlen("Madddy") + 1);
strcpy(str,"Madddy");

char* tmp = strtok(str,"d");
printf ("first token: %s\n", tmp);

do
{
    tmp=strtok(NULL, "ay");
    if (tmp != NULL ) {
       printf ("next token: %s\n", tmp);
    }
} while(tmp != NULL);
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top