Recursivity is the key here. Try something around that:
def parse(it):
result = []
while True:
try:
tk = next(it)
except StopIteration:
break
if tk == '}':
break
val = next(it)
if val == '{':
result.append((tk,parse(it)))
else:
result.append((tk, val))
return result
The use case:
import pprint
data = """
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
"""
r = parse(iter(data.split()))
pprint.pprint(r)
... which produce (Python 2.6):
[('Continent',
[('Name', 'Europe'),
('Country',
[('Name', 'UK'),
('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]
Please take this as only starting point, and feel free to improve the code as you need (depending on your data, a dictionary could have been a better choice, maybe). In addition, the sample code does not handle properly ill formed data (notably extra or missing }
-- I urge you to do a full test coverage ;)
EDIT: Discovering pyparsing
, I tried the following which appears to work (much) better and could be (more) easily tailored for special needs:
import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas
def syntax():
lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress()
key = Word( alphas )
atom = Word ( alphas )
expr = Forward()
pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
expr << Group ( key + pair )
return expr
expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)
Producing:
[['Continent',
['Name', 'Europe'],
['Country',
['Name', 'UK'],
['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]