Question

Let's say i'd like to re-invent CoffeeScript :) Or Python. Or Stylus, or YAML :) I need some tool, which will turn my indentation-base syntax into abstract syntax tree. Google unfortunately knowns nothing about [indentation-based sytntax to AST]. Do you guys know any tool like this? To be more specific, what I have

===source===
Lorem ipsum:
    dolor sit amet:
        consectetuer adipiscing elit
    sed diam nonummy
nibh euismod tincidunt:
    ut laoreet dolore

...and what I need:

===result===
[
    {
        directive: "Lorem ipsum", 
        content: [
            {
                directive: "dolor sit amet", 
                content: [
                    {directive: "consectetuer adipiscing elit", content: []}
                ]
            },
            {directive: "sed diam nonummy", content: []}
         ]
     }, {
        directive: "nibh euismod tincidunt",
        content: [
            {directive:"ut laoreet dolore", content: []}
        ]
     }
]

It would be great, if you could recommend some tool like this. It would be awesome if this tool is written on python/javascript and display result as JSON. It would be also cool if you can give a piece of advice about how to create this tool-of-a-dream by myself :) Thanx!

Was it helpful?

Solution

It's simple enough to write this yourself using recursion. Here is one that creates a list -- I'll leave the dict version as an exercise for you.

import sys
import re

def DentArthurDent(fp, dents = 0, nextline = None):
    '''Read from FP until EOF or an exdent
       Return dict and next line'''

    tree = []
    while True:
        line, nextline = nextline or fp.readline(), None
        if not line:
            return tree, ''
        parts = re.match(r'(^ *)(.*)', line).group(1,2)
        dent = len(parts[0])
        if dent == dents:
            tree.append(parts[1])
        elif dent > dents:
            child_tree, nextline = DentArthurDent(fp, dent, line)
            tree.append(child_tree)
        else:
            return tree,line


import json
tree, _ = DentArthurDent(sys.stdin)
print json.dumps(tree, indent=4)

This input:

line 1
line 2
  line 3
    line 4
    line 5
  line 6

yields this output:

[
    "line 1", 
    "line 2", 
    [
        "line 3", 
        [
            "line 4", 
            "line 5"
        ], 
        "line 6"
    ]
]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top