Question

If I have a keyword, how can I get it to, once it encounters a keyword, to just grab the rest of the line and return it as a string? Once it encounters an end of line, return everything on that line.

Here is the line I'm looking at:

  description here is the rest of my text to collect

Thus, when the lexer encounters description, I would like "here is the rest of my text to collect" returned as a string

I have the following defined, but it seems to be throwing an error:

states = (
     ('bcdescription', 'exclusive'),
)

def t_bcdescription(t):
    r'description '
    t.lexer.code_start = t.lexer.lexpos
    t.lexer.level = 1
    t.lexer.begin('bcdescription')

def t_bcdescription_close(t):
    r'\n'
    t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
    t.type="BCDESCRIPTION"
    t.lexer.lineno += t.valiue.count('\n')
    t.lexer.begin('INITIAL')
    return t

This is part of the error being returned:

  File "/Users/me/Coding/wm/wm_parser/ply/lex.py", line 393, in token
raise LexError("Illegal character '%s' at index %d" % (lexdata[lexpos],lexpos), lexdata[lexpos:])
ply.lex.LexError: Illegal character ' ' at index 40

Finally, if I wanted this functionality for more than one token, how could I accomplish that?

Thanks for your time

Was it helpful?

Solution

There is no big problem with your code,in fact,i just copy your code and run it,it works well

import ply.lex as lex 

states = ( 
     ('bcdescription', 'exclusive'),
)

tokens = ("BCDESCRIPTION",)

def t_bcdescription(t):
    r'\bdescription\b'
    t.lexer.code_start = t.lexer.lexpos
    t.lexer.level = 1 
    t.lexer.begin('bcdescription')

def t_bcdescription_close(t):
    r'\n'
    t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
    t.type="BCDESCRIPTION"
    t.lexer.lineno += t.value.count('\n')
    t.lexer.begin('INITIAL')
    return t

def t_bcdescription_content(t):
    r'[^\n]+'

lexer = lex.lex()
data = 'description here is the rest of my text to collect\n'
lexer.input(data)

while True:
    tok = lexer.token()
    if not tok: break      
    print tok 

and result is :

LexToken(BCDESCRIPTION,' here is the rest of my text to collect\n',1,50)

So maybe your can check other parts of your code

and if I wanted this functionality for more than one token, then you can simply capture words and when there comes a word appears in those tokens, start to capture the rest of content by the code above.

OTHER TIPS

It is not obvious why you need to use a lexer/parser for this without further information.

>>> x = 'description here is the rest of my text to collect'
>>> a, b = x.split(' ', 1)
>>> a
'description'
>>> b
'here is the rest of my text to collect'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top