Question

I am implementing my own StringTokenizer class in php, because the strtok function can only handle one opened tokenizer at the same time.

With

Hello;this;is;a;text

it works perfectly. The output is:

**Hello**
**this**
**is**
**a**
**text**

But with

Hello;this;is;a;text;

it outputs:

**Hello**
**this**
**is**
**a**
**text**
****
****
<endless loop>

But I except the following output:

**Hello**
**this**
**is**
**a**
**text**
****

See my code below and please correct me:

class StringTokenizer
{
    private $_str;
    private $_chToken;
    private $_iPosToken = 0;
    private $_bInit;

    public function __construct($str, $chToken)
    {
        if (empty($str) && empty($chToken))
        {
            throw new Exception('String and the token char variables cannot be empty.');
        }
        elseif(empty($chToken) && !empty($str))
        {
            throw new Exception('Missing parameter: Token char cannot be empty.');
        }
        elseif(!empty($chToken) && empty($str))
        {
            throw new Exception('Missing parameter: String cannot be empty.');
        }
        elseif(!empty($chToken) && !empty($str) && is_string($str) && strlen($chToken) >= 0)
        {
            $this->_str = $str;
            $this->_chToken = $chToken;
            $this->_bInit = true;
        }
        else
        {
            throw new Exception('TypeError: Illegal call to __construct from class StringTokenizer.');
        }
    }

    public function next()
    {
        if ($this->_iPosToken === false)
        {
            return false;
        }

        if ($this->_bInit === true && (strlen($this->_str) - 1) > $this->_iPosToken)
        {
            $iCh1stPos = strpos($this->_str, $this->_chToken, $this->_iPosToken) + 1;
            $this->_iPosToken = $iCh1stPos;
            $this->_bInit = false;
            return substr($this->_str, 0, $this->_iPosToken - 1);
        }
        elseif ($this->_bInit === false && (strlen($this->_str) - 1) > $this->_iPosToken)
        {
            $iCh1stPos = $this->_iPosToken;
            $iCh2ndPos = strpos($this->_str, $this->_chToken, $this->_iPosToken);
            if ($iCh2ndPos === false)
            {
                $this->_iPosToken = false;
                return substr($this->_str, $iCh1stPos);
            }
            else 
            {
                $this->_iPosToken = $iCh2ndPos + 1;
                return substr($this->_str, $iCh1stPos, $iCh2ndPos - $iCh1stPos);
            }
        }
    }

    public function hasNext()
    {
        return strpos($this->_str, $this->chToken, $this->_iPosToken) === false ? false : true;
    }
}
$strText = 'Hello;this;is;a;text';
$tokenizer = new StringTokenizer($strText, ';');
$tok = $tokenizer->Next();
while ($tok !== false)
{
    echo '**' . $tok . '**' . PHP_EOL;
    $tok = $tokenizer->next();
}
exit(0);
Was it helpful?

Solution

The problem with the third condition in the next() is this. String length is 26 and the last character match is 26 which you represent with the _iPosToken. so the condition in the 3rd if is false and the block never executes for the last semicolon.

A function in php returns NULL not FALSE by default.source

and the while never terminates at the bottom of the code.

So you have two options here. change the condition in the 3rd if to (strlen($this->_str)) >= $this->_iPosToken OR add a 4th condtion which returns false, as shown below.

public function next()
{
    if ($this->_iPosToken === false)
    {
        return false;
    }

    if ($this->_bInit === true && (strlen($this->_str) - 1) > $this->_iPosToken)
    {
        $iCh1stPos = strpos($this->_str, $this->_chToken, $this->_iPosToken) + 1;
        $this->_iPosToken = $iCh1stPos;
        $this->_bInit = false;
        return substr($this->_str, 0, $this->_iPosToken - 1);
    }
    elseif ($this->_bInit === false && (strlen($this->_str)-1 ) > $this->_iPosToken)
    {
        $iCh1stPos = $this->_iPosToken;

        echo $this->_iPosToken;
        $iCh2ndPos = strpos($this->_str, $this->_chToken, $this->_iPosToken);
        if ($iCh2ndPos === FALSE) // You can chuck this if block. I put a echo here and                             //it never executed.
         {

                            $this->_iPosToken = false;


            return substr($this->_str, $iCh1stPos);
        }
        else
        {

            $this->_iPosToken = $iCh2ndPos + 1;
            return substr($this->_str, $iCh1stPos, $iCh2ndPos - $iCh1stPos);
        }
    }
    else return false;
}

OTHER TIPS

Why do you like reinvent the wheel ?

You can use explode function, and then implements Iterator pattern in this tokenizer, i think it's an good approach.

http://php.net/explode
http://br1.php.net/Iterator

Example

<?php

class StringTokenizer implements Iterator
{
  private $tokens = [];

  private $position = 0;

  public function __construct($string, $separator) 
  {
    $this->tokens = explode($separator, $string);
  }

  public function rewind()
  {
    $this->position = 0;
  }

  public function current()
  {
    return $this->tokens[$this->position];
  }

  public function next()
  {
    ++ $this->position;
  }

  public function key()
  {
    return $this->position;
  }

  public function valid()
  {
    return isset($this->tokens[$this->position]);
  }
}

And using it:

$tokenizer = new StringTokenizer('h;e;l;l;o;', ';');

while($tokenizer->valid()) {
  printf('**%s**', $tokenizer->current());

  $tokenizer->next();
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top