parse reStructuredText README.rst to get description section

Question 1

I think you could use this part of DocUtils:

"Parsing the Document

The Parser analyzes the the input document and creates a node tree representation. In this case we are using the reStructuredText parser (docutils/parsers/rst/init.py). To see what that node tree looks like, we call quicktest.py (which can be found in the tools/ directory of the Docutils distribution) with our example file (test.txt) as first parameter (Windows users might need to type python quicktest.py test.txt):

$ quicktest.py test.txt My favorite language is Python . Let us now examine the node tree:

The top-level node is document. It has a source attribute whose value is text.txt. There are two children: A paragraph node and a target node. The paragraph in turn has children: A text node ("My "), an emphasis node, a text node (" language is "), a reference node, and again a Text node (".").

These node types (document, paragraph, emphasis, etc.) are all defined in docutils/nodes.py. The node types are internally arranged as a class hierarchy (for example, both emphasis and reference have the common superclass Inline). To get an overview of the node class hierarchy, use epydoc (type epydoc nodes.py) and look at the class hierarchy tree." --http://docutils.sourceforge.net/docs/dev/hacking.html

to find just the nodes you need of the total document :) and then write only the relevant nodes

Question 2

I ended up with this, which is not perfect but does the job:

def readme():
    try:
        import docutils
    except ImportError:
        try:
            with open(os.path.join(os.path.dirname(__file__), 'README.rst')) as f:
                return f.read()
        except (IOError, OSError):
            return ''
    with open(os.path.join(os.path.dirname(__file__), 'README.rst')) as f:
        document = docutils.core.publish_doctree(f.read())
        nodes = list(document)
        description = ''
        for node in nodes:
            if str(node).startswith('<topic classes="contents"'):
                break
            if type(node) is docutils.nodes.comment\
            or type(node) is docutils.nodes.title:
                continue
            description += node.astext() + '\n'
        return  return description.encode('ascii', 'ignore').strip()

I would imagine one can do a much better and sophisticated parsing by walking the rST document tree.