Question

I am trying to allow the definition of pandas DataFrame objects in a YAML file, I believe this should be possible because DataFrame objects are pickleable.

My stripped down YAML file is as follows, saved as 'config.yaml':

!!python/object/new:pandas.DataFrame [[{'dimension1_id':58,'metric1':10},{'dimension1_id':50,'metric':10}]]

And I am using the following to load the data into my python script

f = open('config.yaml')
y = yaml.load(f)
print y

The output (reduced) is as follows:

File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2085, in __getattr__
if name in self.columns:
File "properties.pyx", line 55, in pandas.lib.AxisProperty.__get__ (pandas\lib.c:29240)
RuntimeError: maximum recursion depth exceeded while calling a Python object

I'm using the PyYAML documentation as my only source of information on this.

Can anyone guess why pandas is getting into an infinite loop?

EDIT: Seems like DataFrames objects are not serializable by default, and the extra leg-work looks like more trouble than it is worth. Here is the YAML file that gets created by yaml_serializer from just a simple DataFrame object:

!!python/object/new:pandas.core.frame.DataFrame
state: !!python/object/new:pandas.core.internals.BlockManager
  state:
  - - !!python/object/apply:numpy.core.multiarray._reconstruct
      args:
      - &id001 !!python/name:pandas.core.index.Index ''
      - [0]
      - b
      state:
      - - 1
        - [!!python/long '2']
        - &id002 !dtype 'object'
        - false
        - [dfsd, id]
      - [null]
    - !!python/object/apply:numpy.core.multiarray._reconstruct
      args:
      - !!python/name:pandas.core.index.Int64Index ''
      - [0]
      - b
      state:
      - - 1
        - [!!python/long '2']
        - !dtype 'int64'
        - false
        - "\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0"
      - [null]
  - - - [!!python/long '23', !!python/long '123']
      - [!!python/long '7', !!python/long '123']
  - - !!python/object/apply:numpy.core.multiarray._reconstruct
      args:
      - *id001
      - [0]
      - b
      state:
      - - 1
        - [!!python/long '2']
        - *id002
        - false
        - [dfsd, id]
      - [null]
Was it helpful?

Solution

I don't think DataFrames are pickleable "out of the box"...to_pickle is doing some pandas-specific wrangling that other modules would miss. Others around here know more about this.

But I have had some success saving Series to yaml with this little module. Doing it with DataFrames should be possible also, since they can be treated as dicts of Series.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top