Question

So I have a yaml file with lots of trivia questions and a list of answers. However, whenever I try to load this file and dump the contents in python with pyyaml, it dumps them backwards. I'm not sure if it's my yaml file or if I'm doing something wrong with the library.

Let's say that one of my question/answer pairs looks like this in the yaml file -

{"question": "What is the name of this sequence of numbers: 1, 1, 2, 3, 5, 8, 13, ...", 
 "answer": ["The Fibonacci Sequence", "The Padovan Sequence", "The Morris Sequence"]}

When I use yaml.dump() on that python dictionary, it dumps this -

answer: [fibonacci, padovan, morris]\nquestion: 'what sequence is this: 1, 1, 2, 3, 5, 8, 13, ...'\n"

I was expecting this -

- question: "What is the name of this sequence of numbers: 1, 1, 2, 3, 5, 8, 13, ..."
  answer: ["The Fibonacci Sequence", "The Padovan Sequence", "The Morris Sequence"]

Am I doing something wrong here?

Was it helpful?

Solution

YAML associative arrays (and python dictionaries) don't preserve the order of their elements.

However, if order is import then YAML defines an ordered map !!omap which PyYAML by defaults parses into a list of tuples, e.g.:

>>> yaml.load('''!!omap
... - a: foo
... - b: bar''')
[('a','foo'),('b','bar')]

This answer gives some details about how to load an !!omap into them into a Python OrderedDict.

OTHER TIPS

I have a somewhat different answer here. dbaupp's answer is correct if the order of elements is important to you for reasons other than readability. If the only reason you want question to show up before answer is to make the file more human-readable, then you don't need to use !!omap, and can instead use custom representers to get the order you want.

First of all, your problem with the dumper dumping without the - in front is because you're only dumping a single mapping, instead of a list of them. Put your dict inside a list and this will be fixed. So we start with:

d = [{"question": "What is the name of this sequence of numbers: 1, 1, 2, 3, 5, 8, 13, ...", 
 "answer": ["The Fibonacci Sequence", "The Padovan Sequence", "The Morris Sequence"]}]

Now we have a particular order we want the output to be, so we'll specify that, and convert to OrderedDict with that order:

from collections import OrderedDict
order = ['question', 'answer']
do = [ OrderedDict( sorted( z.items(), key=lambda x: order.index(x[0]) ) ) for z in d ]

Next, we need to make it so that PyYAML knows what to do with an OrderedDict. In this case, we don't want it to be an !!omap, we just want a mapping with a particular order. For some motivation unclear to me, if you give dumper.represent_mapping a dict, or anything with an items attribute, it will sort the items before dumping, but if you give it the output of items() (eg, a list of (key, value) tuples), it won't. Thus we can use

def order_rep(dumper, data):
    return dumper.represent_mapping( u'tag:yaml.org,2002:map', data.items(), flow_style=False )
yaml.add_representer( OrderedDict, order_rep )

And then, our output from print yaml.dump(do) ends up as:

- question: 'What is the name of this sequence of numbers: 1, 1, 2, 3, 5, 8, 13, ...'
  answer: [The Fibonacci Sequence, The Padovan Sequence, The Morris Sequence]

There are a number of different ways this could be done. Using OrderedDict isn't actually necessary at all, you just need the question/answer pairs to be of some class that you can write a representer for.

And again, do realize that this is only for human readability and aesthetic purposes. The order here will not be of any YAML significance, as it would if you were using !!omap. It just seemed like this was primarily important to you for readability.

If the order if preferred in dump, below code could be used

import yaml

class MyDict(dict):
   def to_omap(self):
      return [('question', self['question']), ('answer', self['answer'])]

def represent_omap(dumper, data):
   return dumper.represent_mapping(u'tag:yaml.org,2002:map', data.to_omap())

yaml.add_representer(MyDict, represent_omap)

questions = [
   MyDict({'answer': 'My name is Bob.', 'question': 'What is your name?'}),
   MyDict({'question': 'How are you?', 'answer': 'I am fine.'}),
]
print yaml.dump(questions, default_flow_style=False)

The output is:

- question: What is your name?
  answer: My name is Bob.
- question: How are you?
  answer: I am fine.

If it's loading them as a dictionary their order is arbitrary. Dictionaries are not ordered containers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top