Question

I want to parse yaml documents like the following

meta-info-1: val1
meta-info-2: val2

---

Plain text/markdown content!
jhaha

If I load_all this with PyYAML, I get the following

>>> list(yaml.load_all(open('index.yml')))
[{'meta-info-1': 'val1', 'meta-info-2': 'val2'}, 'Plain text/markdown content! jhaha']

What I am trying to achieve here is that the yaml file should contain two documents, and the second one is supposed to be interpreted as a single string document, more specifically any large body of text with markdown formatting. I don't want it to be parsed as YAML syntax.

In the above example, PyYAML returns the second document as a single string. But if the second document has a : character in place of the ! for instance, I get a syntax error. This is because PyYAML is parsing the stuff in that document.

Is there a way I can tell PyYAML that the second document is a just a raw string and not to parse it?

Edit: A few excellent answers there. While using quotes or the literal syntax solves the said problem, I'd like the users to be able to write the plain text without any extra cruft. Just the three -'s (or .'s) and write away a large body of plain text. Which might also include quotes too. So, I'd like to know if I can tell PyYAML to parse only one document, and give the second to me raw.

Eidt 2: So, adapting agf's idea, instead of using a try/except as the second document could be valid yaml syntax,

config_content, body_content = open(filename).read().split('\n---')
config = yaml.loads(config_content)
body = yaml.loads(body_content)

Thanks agf.

Was it helpful?

Solution

You can do

raw = open(filename).read()
docs = []
for raw_doc in raw.split('\n---'):
    try:
        docs.append(yaml.load(raw_doc))
    except SyntaxError:
        docs.append(raw_doc)

If you won't have control over the format of the original document.

From the PyYAML docs,

Double-quoted is the most powerful style and the only style that can express any scalar value. Double-quoted scalars allow escaping. Using escaping sequences \x** and \u****, you may express any ASCII or Unicode character.

So it sounds like there is no way to represent an arbitrary scalar in the parsing if it's not double quoted.

OTHER TIPS

If all you want is to escape the colon character in YAML, then enclose it within single or double quotes. Also, you can try literal style for your second document which should be treated as single scalar.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top