Question

I just wrote a program that tokenizes a char array using pointers. The program only needed to work with a space as a delimiter character. I just turned it in and got full credit, but after turning it in I realized that this program only worked if the delimiter character was a space.

My question is, how could I make this program work with every delimiter character?

The function I've shown you below returns a pointer to the next word in the char array. This is what I believe I need to change if it is to work with all delimiter characters.

Thanks!

Code:

char* StringTokenizer::Next(void)
{
pNextWord = pStart;

if (*pStart == '\0') { return NULL; }

while (*pStart != delim)
{
    pStart++;
}

if (*pStart == '\0') { return NULL; }

*pStart = '\0';
pStart++;

return pNextWord;
}

The printing loop in main:

// this loop will display the tokens
while ( ( nextWord = tk.Next ( ) ) != NULL )
{
    cout << nextWord << endl;
}
Was it helpful?

Solution

The simpliest way is to change your

while (*pStart != delim)

to something like

while (*pStart != ' ' && *pStart != '\n' && *pStart != '\t')

Or, you could make delim a string, and create a function that checks if a char is in the string:

bool isDelim(char c, const char *delim) {
   while (*delim) {
      if (*delim == c)
         return true;
      delim++;
   }
   return false;
}

while ( !isDelim(*pStart, " \n\t") ) 

Or, perhaps the best solution is to use one of the prebuilt functions for doing all this, such as strtok.

OTHER TIPS

Just change

while (*pStart != delim)

to this line

while (*pStart != '\0' && strchr(" \t\n", *pStart) == NULL)

Standard strchr function (declared in string.h header) looks for a character (given as a second argument) in a C-string (given as a first argument) and returns pointer to the string from position where that character firstly occurs. So strchr(" \t\n", *pStart) == NULL means that current character (*pStart) is not found in string " \t\n" and that is not a delimiter! (Change this delimiter string " \t\n" to adapt it to your needs, of course.)

This solution is short and simple way to test whether given character in a set (usually small) of given interesting characters. And it uses standard function.

By the way, you can do this using not only C-string, but with std::string too. All you need is to declare const std::string with " \t\n"-like value and then replace strchr with find method of the declared delimiter string.

Hmm...this doesn't look quite right:

if (*pStart = '\0')

The condition can never be true. I'm guessing you intended == instead of =? You also have a bit of a problem here:

while (*pStart != delim)

If the last word in the string isn't followed by a delimiter, this is going to run off the end of the string, which will cause serious problems.

Edit: Unless you really need to do this on your own, consider using a stringstream for the job. It already has all the right mechanism in place and quite heavily tested. It does add overhead, but it's quite acceptable in a lot of cases.

Not compiled. but I'd do something like this.

 //const int N = someGoodValue;
char delimList[N] = {' ',',','.',';', '|', '!', '$', '\n'};//all delims here.

char* StringTokenizer::Next(void)
{
    if (*pStart == '\0') { return NULL; }

    pNextWord = pStart;

    while (1){  
        for (int x = 0; x < N; x++){
            if (*pStart == delimList[x]){ //this is it.
                *pStart = '\0';
                pStart++;
                return pNextWord;
            }

        }
        if ('\0' == *pStart){ //last word.. maybe.
                return pNextWord;   
        }
        pStart++;
    }
}

// (!compiled).

I assume that we want to stick to C instead of C++. Functions strspn and strcspn are good for tokenizing by a set a delimiters. You can use strspn to find where the next separator begins (i.e. where the current token ends) and then using strcspn to find where the separator ends (i.e. where the next token begins). Loop until you reach the end.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top