I'm trying to extract the beginning and ending line numbers of all docstrings in a Python module. Is there a sensible way of doing this without regex?

有帮助吗?

解决方案

The best way to do this is with the ast module. In particular, ast.get_docstring almost does what you want; it returns the content of the docstring rather than the node, but you can use the same algorithm to find the docstring node and its location:

root = ast.parse('''
def foo():
    """the foo function"""
    pass
''')
for node in ast.walk(root):
    if isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.Module)):
        if (node.body and isinstance(node.body[0], ast.Expr) and
            isinstance(node.body[0].value, ast.Str)):
            print node.lineno, node.body[0].value.lineno, node.body[0].value.s

Although undocumented, the lineno property gives the last line of a node, so the lineno of the parent node will be the first line of the docstring or the line before it. It doesn't look like there's an easy way to tell the difference between a docstring starting on the same line as the class or def keyword and on the following line, especially when you consider line continuation (\) characters.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top