Frage

I am writing a lexer in C++ and I am reading from a file character by character, however, how do you do tokenization in this case? I can't use strtok since I have character not a string. Somehow I need to keep reading until I reach a delimeter?

War es hilfreich?

Lösung

The answer is Yes. You need to keep reading until you hit a delimiter.

Andere Tipps

There are multiple solutions.

The simplest thing to do is exactly that: keep a buffer (std::string) of the characters you already read until you reach a delimiter. At that point, you build a token from the accumulated characters in the buffer, clear the buffer, and push the delimiter (if necessary) in the buffer.

Another solution would be to read ahead of the time: ie, pick up the entire line with std::getline (for example), and then check what's on this line. In general the end-of-line is a natural token delimiter.

This works well... when delimiters are easy.

Unfortunately some languages, like C++, have awkward grammars. For example, in C++ >> can be either:

  • the operator >> (for right-shift and stream extraction)
  • the end of two nested templates (ie could be rewritten as > >)

In those cases... well, just don't bother with the difference in the tokenizer, and let your AST building pass disambiguate, it's got more information.

On the basis of information provided you. If you want to read upto a delimiter from a File, use getline(char *,int,char) function.

getline() is use to read upto n characters or upto a delimiter.

Example:

 #include<fstream.h>
using namespace std;

    main()
    {
        fstream f;

        f.open("test.cpp",ios::in);
        char *c;
        f.getline(c,2,' ');
        cout<<c; // upto 1 char or till a space 

    }
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top