Hachoir - Retrieving data from a group

Question 1

This seems less straightforward for a WMV file. I have turned the metadata for such videos into a defaultdict, and it is more straightforward to get the image width now:

from collections import defaultdict
from pprint import pprint

from hachoir_metadata import metadata
from hachoir_core.cmd_line import unicodeFilename
from hachoir_parser import createParser

# using this example http://archive.org/details/WorkToFishtestwmv
filename = './test_wmv.wmv' 
filename, realname = unicodeFilename(filename), filename
parser = createParser(filename)

# See what keys you can extract
for k,v in metadata.extractMetadata(parser)._Metadata__data.iteritems():
    if v.values:
        print v.key, v.values[0].value

# Turn the tags into a defaultdict
metalist = metadata.extractMetadata(parser).exportPlaintext()
meta = defaultdict(defaultdict)
for item in metalist:
    if item.endswith(':'):
        k = item[:-1]
    else:
        tag, value = item.split(': ')
        tag = tag[2:]
        meta[k][tag] = value

print meta['Video stream #1']['Image width'] # 320 pixels

Question 2

To get width x height from the first top-level metadata group that has the size info in the media file without accessing private attributes and without parsing the text output, you could use file_metadata.iterGroups():

#!/usr/bin/env python
import sys
from itertools import chain

# $ pip install hachoir-{core,parser,metadata}
from hachoir_core.cmd_line import unicodeFilename
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser

file_metadata = extractMetadata(createParser(unicodeFilename(sys.argv[1])))
it = chain([file_metadata], file_metadata.iterGroups())
print("%sx%s" % next((metadata.get('width'), metadata.get('height'))
                     for metadata in it
                     if metadata.has('width') and metadata.get('height')))

To convert metadata into a dictionary (non-recursively, i.e., iterate groups manually if needed):

def metadata_as_dict(metadata):
    return {item.key: (len(item.values) > 1 and 
                       [v.value for v in item.values] or
                       item.values[0].value)
            for item in metadata if item.values}