Question

I developing a pyparsing grammar which must to insert new tokens to the ouput. This tokens are not from the original input.

Ex.:

Input:

'/* foo bar*/' 

Ouput:

['comment', '/* foo bar*/']

How can I add elements to the parser output if this elements are not in the original expression?

Was it helpful?

Solution 2

Reading the pyparsing's API I found a function with a suggestive name called replaceWith. Using this function and addParseAction I was able to solve the problem.

The following code is the solution to the problem:

from pyparsing import *

crazyVariable = Empty().addParseAction(replaceWith('comment')) + cStyleComment

print(crazyVariable.parseString('/* foo bar*/' ))

The output:

['comment', '/* foo bar*/']

OTHER TIPS

An alternative way to achieve the same result, and perhaps one with more expressive power, is to use named expressions. For example:

from pyparsing import *

grammar = cStyleComment("comment")
s = '/* foo bar*/' 

sol = grammar.parseString(s)
print sol.asDict()

>>> {'comment': '/* foo bar*/'}

You'll notice that you don't have a list as you intended, but this will allow you dict-like access the results once they get more complicated. Let's see that in action:

code    = Word(alphanums+'(){},.<>"; ') 
grammar = OneOrMore(code("code") | cStyleComment("comment"))
s = 'cout << "foobar"; /* foo bar*/' 

sol = grammar.parseString(s)
print "code:", sol["code"]
print "comment", sol["comment"]

>>> code: cout << "foobar"; 
>>> comment: /* foo bar*/

This alternative solution is not exactly the answer for the question in the title but is the answer for the more general problem which the question aims to solve:

How to building an a syntax tree using node objects instantiated from classes:

# -*- coding: utf-8 -*-

from pyparsing import *



def uncommentCStyleComment(t):  ''' remove /* and */ from a comment '''; return t[0][2:-2]


''' 
classes which replaces functions as arguments in setParseAction or addParseAction 
each class will be used to build a node in a syntax tree
t argument on constructor is the list of child nodes of the node 
'''




class Foo(object):
    def __init__(self,t):   self.value = t[0]   # t = ['foo']

    def __str__(self):      return self.value   # return 'foo'

class Bar(object):
    members = []                                    # list of foos and comments

    def __init__(self,t):   self.members.extend(t)  # t = list of foos and comments

    def __str__(self):
        _str = 'Bar:\n'      
        for member in self.members: _str = _str + '\t' + str(member) + '\n'             
        return _str

class Comment(object):
    def __init__(self,t):   self.value = t[0];                      # t = ['/* Some comment */']

    def __str__(self):      return '/*' + str(self.value) + '*/'    # return '/* Some comment */'



# return an object of type Foo instead a token
foo     = Combine('foo')                    .setParseAction(Foo)
# uncomment and return an object of type Comment instead a token
comment = cStyleComment                     .setParseAction(uncommentCStyleComment, Comment) 
# return an object of type Bar instead a token
bar     = OneOrMore(comment | foo)('ast')   .setParseAction(Bar)

# parse the input string
tokens = bar.parseString('foo\n/* data bar*/\nfoo\nfoo' )

# print the object named ast in the parser output
print( tokens['ast'] )

It's a very elegant way of built an output and without necessity of post processing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top