Get contents Attribute Value pairs using BeautifulSoup or XPATH

Question 1

Example solution using lxml.html and XPath:

select all h5 elements
and for each h5 element,
1. select next siblings elements -- following-sibling::*
2. that are not h5 themselves, -- [not(self::h5)]
3. and that have up to the current h5 number preceding sibling -- [count(preceding-sibling::h5) = 1] then 2, then 3...

(with the for loop enumerate() starting at 1)

Sample code, with simple prints of the text content of the elements (using lxml.html's .text_content() on elements):

import lxml.html
html = """<div id="animalcontainer" class="container last fixed-height">

                <h5>
                  Husbandary Management
                </h5>
                <span>
                  Animal: Cow
                </span>
                <span>
                  Farmer: Mr smith
                </span>
                <h5>
                  Milch Category
                </h5>
                <p>
                  Milk supply
                </p>
                <h5>
                  Services
                </h5>
                <p>
                  cow milk, ghee
                </p>
                <h5>
                  animal colors
                </h5>
                <span>
                  green,red
                </span>


              </div>"""
doc = lxml.html.fromstring(html)
headers = doc.xpath('//div/h5')
for i, header in enumerate(headers, start=1):
    print "--------------------------------"
    print header.text_content().strip()
    for following in header.xpath("""following-sibling::*
                                     [not(self::h5)]
                                     [count(preceding-sibling::h5) = %d]""" % i):
        print "\t", following.text_content().strip()

This outputs:

--------------------------------
Husbandary Management
    Animal: Cow
    Farmer: Mr smith
--------------------------------
Milch Category
    Milk supply
--------------------------------
Services
    cow milk, ghee
--------------------------------
animal colors
    green,red

Question 2

I finally did it using BS, it seems it can be done more efficiently as the following solution regenerates the siblings every time:

h5s=addinfo.findAll('h5')
txtcontents=[]
datad={}
for h5el in h5s:
    hcontents=list(h5el.nextSiblingGenerator())
    txtcontents=[]
    for con in hcontents:
        try:
            if con.name=='h5':
                break
        except AttributeError:
            print "error:",con

            continue
        txtcontents.append(con.contents)
    datad["\n".join(h5el.contents)]=txtcontents
print datad