سؤال

In following C program, strtok is used to split the string. Program is giving excepted output, but I am not able to understand how it works.

First, we have passed string to tokenize and delimiter. But in later iterations, we are just passing NULL. How and why function remembers string?

What if I want to use tokenize to different string simultaneously?

#include "stdafx.h"
#include <cstdio>
#include <cstring>

int main(int argc, char* argv[])
{
    char arr[] = "This is string to split";

    char * subStr = new char[10];
    subStr = strtok(arr, " ");

    while (subStr)
    {
        printf("%s\n", subStr);
        subStr = strtok(NULL, " ");
    }

    return 0;
}

Output:

This
is
string
to
split
هل كانت مفيدة؟

المحلول

The strtok function has an internal state that remembers the last position which it has reached. Since it overwrites the original string by replacing the token with zero, all it needs to remember is the next position in the string. If you call strtok with a non-null string argument, the internal state is reset to the new string. So indeed, you cannot use it on multiple strings at once, only one after the other. (Some platforms provide the reentrant variant strtok_r which allows you to pass your own state variable.)

Here's a sample implementation:

char * my_strtok(char * in, char delim)   // not quite the same signature
{
    _Thread_local static char * pos = NULL;

    if (in) { pos = in; }

    char * p = find_next_delimiter(pos, delim);    // NULL if not found
    if (p) { *p = '\0'; ++p; pos = p; }

    return p;
}

(The real strtok searches for any delimiter of a given list, and also skips over empty fields.) The reentrant variant of this would replace the static variable pos with a function parameter.

نصائح أخرى

The strtok function uses a static variable to keep track of state from previous calls. For this reason, it's not thread safe (check out strtok_r instead) and you should not use it to simultaneously tokenize different strings on different threads.

Here is one way it might be implemented.

"How" = using static variable.

"Why" = for continue require next behind zero, if you pass original string again -- you will need again skip first tokens, that loose of CPU cycles

How and why strtok remembers string?

The strtok() function uses a static buffer while parsing.

What if I want to use tokenize to different string simultaneously?

You can build your own:

#include <stdio.h>
#include <string.h>

char *scan(char **pp, char c)
{
    char *s = *pp, *p;

    p = strchr(*pp, c);
    if (p) *p++ = '\0';
    *pp = p;
    return s;
}

int main(void)
{
    char s[] = "This is string to split";
    char *p = s;

    while (p) {
        printf("%s\n", scan(&p, ' '));
    }
    return 0;
}
  • Note that scan() replaces all ocurrences of delimiter with \0 in the original string
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top