Question

I'm writing a compiler in Python, and I made a hand-written lexer, because I can't figure out how to parse indentation in PLY. Also, my lexer uses some yield statements like so:

def scan():
...
    for i in tokens:
        if i[0]: yield Token(self.line, i[0] if i[0] in keywords else "ident", i[0])
            elif i[1]:
                 if "e" in i[1]:
                     base, exp = i[1].split("e")
                     val = float(base) * 10 ** int(exp)
                 else: val = float(i[1])
                 yield Token(self.line, "float", val)
        ... other cases ...

However, I realized that the PLY parser requires a token method, so I made one that looks like this:

def token(self):
    return next(self.scan())

The actual scanning using scan() takes an average of 124 ms, according to my tests, but when I use the PLY parser, the parsing doesn't start after a few minutes. It appears that my token() method has a problem.

Also, I tried to rename the scan() method so that it could become the interface. Python returns something like

AttributeError: 'generator' object has no attribute 'type'

So it appears that PLY needs a method that will return a single token at a time.

Is there any way to rewrite the token() method so that it would return the next iteration of scan() and not be that slow?

Was it helpful?

Solution

You need to save your generator somewhere, like:

def start(...):
   self.lexer = self.scan()

def token(...):
    return next(self.lexer)

Disclaimer: I don't know anything about PLY.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top