Question

Having a snippet like this:

import yaml
class User(object):
    def __init__(self, name, surname):
       self.name= name
       self.surname= surname

user = User('spam', 'eggs')
serialized_user = yaml.dump(user)
#Network
deserialized_user = yaml.load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

Yaml docs says that it is not safe to call yaml.load with any data received from an untrusted source; so, what should I modify to my snippet\class to use safe_load method?
Is it possible?

Was it helpful?

Solution

It appears that safe_load, by definition, does not let you deserialize your own classes. If you want it to be safe, I'd do something like this:

import yaml
class User(object):
    def __init__(self, name, surname):
       self.name= name
       self.surname= surname

    def yaml(self):
       return yaml.dump(self.__dict__)

    @staticmethod
    def load(data):
       values = yaml.safe_load(data)
       return User(values["name"], values["surname"])

user = User('spam', 'eggs')
serialized_user = user.yaml()
print "serialized_user:  %s" % serialized_user.strip()

#Network
deserialized_user = User.load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

The advantage here is that you have absolute control over how your class is (de)serialized. That means that you won't get random executable code over the network and run it. The disadvantage is that you have absolute control over how your class is (de)serialized. That means you have to do a lot more work. ;-)

OTHER TIPS

Another way exists. From the PyYaml docs:

A python object can be marked as safe and thus be recognized by yaml.safe_load. To do this, derive it from yaml.YAMLObject [...] and explicitly set its class property yaml_loader to yaml.SafeLoader.

You also have to set the yaml_tag property to make it work.

YAMLObject does some metaclass magic to make the object loadable. Note that if you do this, the objects will only be loadable by the safe loader, not with regular yaml.load().

Working example:

import yaml

class User(yaml.YAMLObject):
    yaml_loader = yaml.SafeLoader
    yaml_tag = u'!User'

    def __init__(self, name, surname):
       self.name= name
       self.surname= surname

user = User('spam', 'eggs')
serialized_user = yaml.dump(user)

#Network

deserialized_user = yaml.safe_load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

The advantage of this one is that it's prety easy to do; the disadvantages are that it only works with safe_load and clutters your class with serialization-related attributes and metaclass.

If you have many tags and don't want to create objects for all of them, or in case you don't care about the actual type returned, only about dotted access, you catch all undefined tags with the following code:

import yaml

class Blob(object):
    def update(self, kw):
        for k in kw:
            setattr(self, k, kw[k])

from yaml.constructor import SafeConstructor

def my_construct_undefined(self, node):
    data = Blob()
    yield data
    value = self.construct_mapping(node)
    data.update(value)

SafeConstructor.add_constructor(None, my_construct_undefined)


class User(object):
    def __init__(self, name, surname):
        self.name= name
        self.surname= surname

user = User('spam', 'eggs')
serialized_user = yaml.dump(user)
#Network
deserialized_user = yaml.safe_load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

In case you wonder why the my_construct_undefined has a yield in the middle: that allows for instantiating the object separately from creation of its children. Once the object exist it can be referred to in case it has an anchor and of the children (or their children) a reference. The actual mechanisme to create the object first creates it, then does a next(x) on it to finalize it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top