The key here would be recursion, if you do not know the nesting of the types in advance. Here is an example (formatted the text for readability):
#!/usr/bin/env python
import collections
a = collections.OrderedDict([(u'p', [u"""
"The Exam Room" is a new series in
which everyday medical questions are answered by physicians and
professors from the Yale School of Medicine.""",
u"""In our second episode: Dr. Stephen Strittmatter,
Vincent Coates Professor of Neurology and director of
the Adler Memory Clinic in Neurology, explains when
memory loss can become a problem and what you can do to
boost your brain power.""",
collections.OrderedDict([(u'em',
u'Produced & Hosted by Noah Golden')])])])
Now flatten the object, which might be a mapping or a list. Three options are implemented: if the value found is a string, we just append it to our collector
. If it is a list
or a Mapping
, we call flatten
again. Note that the you can specify some allowed tags with the allowed
kwarg:
def flatten(obj, allowed=(u'p', u'em')):
collector = []
def process(v, collector=collector):
if isinstance(v, (list, collections.Mapping)):
collector += flatten(v, allowed=allowed)
elif isinstance(v, basestring):
collector.append(v)
else:
raise ValueError('Cannot handle type: {t}'.format(t=v.__class__))
if isinstance(obj, list):
for v in obj:
process(v)
if isinstance(obj, collections.Mapping):
for k, v in obj.iteritems():
if k in allowed:
process(v)
return collector
if __name__ == '__main__':
print(flatten(a))
The result with your example would be a three element list, which looks something like this:
[u'"The Exam Room" is a new series ...',
u'In our second episode: ...',
u'Produced & Hosted by Noah Golden']
Now if you want a single string, just join
the now flattened list:
print(''.join(flatten(a)))