I am creating a script which need to parse the yaml output that the puppet outputs.

When I does a request agains example https://puppet:8140/production/catalog/my.testserver.no I will get some yaml back that looks something like:

--- &id001 !ruby/object:Puppet::Resource::Catalog
  aliases: {}
  applying: false
  classes: 
    - s_baseconfig
    ...
  edges: 
    - &id111 !ruby/object:Puppet::Relationship
      source: &id047 !ruby/object:Puppet::Resource
        catalog: *id001
        exported: 

and so on... The problem is when I do an yaml.load(yamlstream), I will get an error like:

yaml.constructor.ConstructorError: could not determine a constructor for the tag '!ruby/object:Puppet::Resource::Catalog'
 in "<string>", line 1, column 5:
   --- &id001 !ruby/object:Puppet::Reso ... 
       ^

As far as I know, this &id001 part is supported in yaml.

Is there any way around this? Can I tell the yaml parser to ignore them? I only need a couple of lines from the yaml stream, maybe regex is my friend here? Anyone done any yaml cleanup regexes before?

You can get the yaml output with curl like:

curl --cert /var/lib/puppet/ssl/certs/$(hostname).pem --key /var/lib/puppet/ssl/private_keys/$(hostname).pem --cacert /var/lib/puppet/ssl/certs/ca.pem -H 'Accept: yaml' https://puppet:8140/production/catalog/$(hostname)

I also found some info about this in the puppet mailinglist @ http://www.mail-archive.com/puppet-users@googlegroups.com/msg24143.html. But I cant get it to work correctly...

有帮助吗?

解决方案

I have emailed Kirill Simonov, the creator of PyYAML, to get help to parse Puppet YAML file.

He gladly helped with the following code. This code is for parsing Puppet log, but I'm sure you can modify it to parse other Puppet YAML file.

The idea is to create the correct loader for the Ruby object, then PyYAML can read the data after that.

Here goes:

#!/usr/bin/env python

import yaml

def construct_ruby_object(loader, suffix, node):
    return loader.construct_yaml_map(node)

def construct_ruby_sym(loader, node):
    return loader.construct_yaml_str(node)

yaml.add_multi_constructor(u"!ruby/object:", construct_ruby_object)
yaml.add_constructor(u"!ruby/sym", construct_ruby_sym)


stream = file('201203130939.yaml','r')
mydata = yaml.load(stream)
print mydata

其他提示

I believe the crux of the matter is the fact that puppet is using yaml "tags" for ruby-fu, and that's confusing the default python loader. In particular, PyYAML has no idea how to construct a ruby/object:Puppet::Resource::Catalog, which makes sense, since that's a ruby object.

Here's a link showing some various uses of yaml tags: http://www.yaml.org/spec/1.2/spec.html#id2761292

I've gotten past this in a brute-force approach by simply doing something like:

cat the_yaml | sed 's#\!ruby/object.*$##gm' > cleaner.yaml

but now I'm stuck on an issue where the *resource_table* block is confusing PyYAML with its complex keys (the use of '? ' to indicate the start of a complex key, specifically).

If you find a nice way around that, please let me know... but given how tied at the hip puppet is to ruby, it may just be easier to do you script directly in ruby.

I only needed the classes section. So I ended up creating this little python function to strip it out...

Hope its usefull for someone :)

#!/usr/bin/env python

import re

def getSingleYamlClass(className, yamlList):
    printGroup = False
    groupIndent = 0
    firstInGroup = False
    output = ''

    for line in yamlList:
        # Count how many spaces in the beginning of our line
        spaceCount = len(re.findall(r'^[ ]*', line)[0])
        cleanLine = line.strip()

        if cleanLine == className:
            printGroup = True
            groupIndent = spaceCount
            firstInGroup = True

        if printGroup and (spaceCount > groupIndent) or firstInGroup:
            # Strip away the X amount of spaces for this group, so we get valid yaml
            output += re.sub(r'^[ ]{%s}' % groupIndent, '', line) + '\n'
            firstInGroup = False # Reset this
        else:
            # End of our group, reset
            groupIndent = 0
            printGroup = False

    return output

getSingleYamlClass('classes:', open('puppet.yaml').readlines())

Simple YAML parser:

with open("file","r") as file:
    for line in file:
        re= yaml.load('\n'.join(line.split('?')[1:-1]).replace('?','\n').replace('""','\'').replace('"','\''))
        # print '\n'.join(line.split('?')[1:-1])
        # print '\n'.join(line.split('?')[1:-1]).replace('?','\n').replace('""','\'').replace('"','\'')
        print line
        print re
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top