Question

I've written a simple string tokenizing program using pointers for a recent school project. However, I'm having trouble with my StringTokenizer::Next() method, which, when called, is supposed to return a pointer to the first letter of the next word in the char array. I get no compile-time errors, but I get a runtime error which states:

Unhandled exception at 0x012c240f in Project 5.exe: 0xC0000005: Access violation reading location 0x002b0000.

The program currently tokenizes the char array, but then stops and this error pops up. I have a feeling it has to do with the NULL checking I'm doing in my Next() method.

So how can I fix this?

Also, if you notice anything I could do more efficiently or with better practice, please let me know.

Thanks!!


StringTokenizer.h:

#pragma once

class StringTokenizer
{
public:
StringTokenizer(void);
StringTokenizer(char* const, char);
char* Next(void);
~StringTokenizer(void);
private:
char* pStart;
char* pNextWord;
char delim;
};

StringTokenizer.cpp:

#include "stringtokenizer.h"
#include <iostream>
using namespace std;

StringTokenizer::StringTokenizer(void)
{
pStart = NULL;
pNextWord = NULL;
delim = 'n';
}

StringTokenizer::StringTokenizer(char* const pArray, char d)
{
pStart = pArray;
delim = d;
}

char* StringTokenizer::Next(void)
{
pNextWord = pStart;
if (pStart == NULL) { return NULL; }

while (*pStart != delim) // access violation error here
{
    pStart++;
}

if (pStart == NULL) { return NULL; }

*pStart = '\0'; // sometimes the access violation error occurs here
pStart++;

return pNextWord;
}

StringTokenizer::~StringTokenizer(void)
{
delete pStart;
delete pNextWord;
}

Main.cpp:

// The PrintHeader function prints out my
// student info in header form
// Parameters - none
// Pre-conditions - none
// Post-conditions - none
// Returns - void
void PrintHeader();

int main ( )
{
const int CHAR_ARRAY_CAPACITY = 128;
const int CHAR_ARRAY_CAPCITY_MINUS_ONE = 127;

// create a place to hold the user's input
// and a char pointer to use with the next( ) function
char words[CHAR_ARRAY_CAPACITY];
char* nextWord;

PrintHeader();

cout << "\nString Tokenizer Project";
cout << "\nyour name\n\n";
cout << "Enter in a short string of words:";
cin.getline ( words, CHAR_ARRAY_CAPCITY_MINUS_ONE );

// create a tokenizer object, pass in the char array
// and a space character for the delimiter
StringTokenizer tk( words, ' ' );

// this loop will display the tokens
while ( ( nextWord = tk.Next ( ) ) != NULL )
{
    cout << nextWord << endl;
}


system("PAUSE");
return 0;
}


EDIT:

Okay, I've got the program working fine now, as long as the delimiter is a space. But if I pass it a `/' as a delim, it comes up with the access violation error again. Any ideas?

Function that works with spaces:

char* StringTokenizer::Next(void)
{
pNextWord = pStart;

if (*pStart == '\0') { return NULL; }

while (*pStart != delim)
{
    pStart++;
}

if (*pStart = '\0') { return NULL; }

*pStart = '\0';
pStart++;

return pNextWord;
}
Was it helpful?

Solution

This answer is provided based on the edited question and various comments/observations in other answers...

First, what are the possible states for pStart when Next() is called?

  1. pStart is NULL (default constructor or otherwise set to NULL)
  2. *pStart is '\0' (empty string at end of string)
  3. *pStart is delim (empty string at an adjacent delimiter)
  4. *pStart is anything else (non-empty-string token)

At this point we only need to worry about the first option. Therefore, I would use the original "if" check here:

if (pStart == NULL) { return NULL; }

Why don't we need to worry about cases 2 or 3 yet? You probably want to treat adjacent delimiters as having an empty-string token between them, including at the start and end of the string. (If not, adjust to taste.) The while loop will handle that for us, provided you also add the '\0' check (needed regardless):

while (*pStart != delim && *pStart != '\0')

After the while loop is where you need to be careful. What are the possible states now?

  1. *pStart is '\0' (token ends at end of string)
  2. *pStart is delim (token ends at next delimiter)

Note that pStart itself cannot be NULL here.

You need to return pNextWord (current token) for both of these conditions so you don't drop the last token (i.e., when *pStart is '\0'). The code handles case 2 correctly but not case 1 (original code dangerously incremented pStart past '\0', the new code returned NULL). In addition, it is important to reset pStart for case 1 correctly, such that the next call to Next() returns NULL. I'll leave the exact code as an exercise to reader, since it is homework after all ;)

It's a good exercise to outline the possible states of data throughout a function in order to determine the correct action for each state, similar to formally defining base cases vs. recursive cases for recursive functions.

Finally, I noticed you have delete calls on both pStart and pNextWord in your destructor. First, to delete arrays, you need to use delete [] ptr; (i.e., array delete). Second, you wouldn't delete both pStart and pNextWord because pNextWord points into the pStart array. Third, by the end, pStart no longer points to the start of the memory, so you would need a separate member to store the original start for the delete [] call. Lastly, these arrays are allocated on the stack and not the heap (i.e., using char var[], not char* var = new char[]), and therefore they shouldn't be deleted. Therefore, you should simply use an empty destructor.

Another useful tip is to count the number of new and delete calls; there should be the same number of each. In this case, you have zero new calls, and two delete calls, indicating a serious issue. If it was the opposite, it would indicate a memory leak.

OTHER TIPS

An access violation (or "segmentation fault" on some OSes) means you've attempted to read or write to a position in memory that you never allocated.

Consider the while loop in Next():

while (*pStart != delim) // access violation error here
{
    pStart++;
}

Let's say the string is "blah\0". Note that I've included the terminating null. Now, ask yourself: how does that loop know to stop when it reaches the end of the string?

More importantly: what happens with *pStart if the loop fails to stop at the end of the string?

Inside ::Next you need to check for the delim character, but you also need to check for the end of the buffer, (which I'm guessing is indicated by a \0).

while (*pStart != '\0' && *pStart != delim) // access violation error here
{
    pStart++;
}

And I think that these tests in ::Next

if (pStart == NULL) { return NULL; }

Should be this instead.

if (*pStart == '\0') { return NULL; }

That is, you should be checking for a Nul character, not a null pointer. Its not clear whether you intend for these tests to detect an uninitialized pStart pointer, or the end of the buffer.

An access violation usually means a bad pointer.

In this case, the most likely cause is running out of string before you find your delimiter.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top