Question

I'm trying to parse big text files with python.

These files have a syntax like this:

<option1> {
<variable1>=<value1>; //<comment> 
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment> 
}

<option2> {
<variable1>=<value1>; //<comment> 
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment> 
}

...
...

<optionN> {
<variable1>=<value1>; //<comment> 
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment> 
}

And I want to get for instance <optionK>[<variableT>] value.

Is there an optimal way to do this by using a fileparser?

Was it helpful?

Solution

Consider your above example (ugly solution) you can use http://docs.python.org/2/library/htmlparser.html as follow:

test = """
<option1> {
<variable1>=<value1>; //<comment>
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment>
}

<option2> {
<variable1>=<value1>; //<comment>
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment>
}

...
...

<optionN> {
<variable1>=<value1>; //<comment>
<variable2>=<value2>;
..
<variableN>=<valueN>; //<comment>
}

"""

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    option = ""
    key = ""
    value = ""
    r = {}
    def handle_starttag(self, tag, attrs):
        self.currentTag = tag
        print "Encountered a start tag:", tag
        if "option" in tag:
            #self.r = {}
            self.option = tag
            self.r[self.option] = {}
        elif "{" in self.currentData or "=" not in self.currentData and "//" not in self.currentData:
            self.key = tag
            self.r[self.option][self.key] = ""
        elif "=" in self.currentData:
            self.value = tag
            self.r[self.option][self.key] = self.value
            #print self.r
    def handle_endtag(self, tag):
        self.currentData = None
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        self.currentData = data
        print "Encountered some data  :", data
        #find a condition to yield result here "}" ? 

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()  
parser.feed(test) 
print parser.r
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top