lxml and xpath in python: get pairs of h3 and email from html document in a list with possible missing e-mail

Question

If you are not stuck with lxml, you can give a try to BeautifulSoup. I find it easier to use. I looked into that page but couldn't parse it fine because it has an xml header just before the html header, like:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="ES" xml:lang="ES" >
...

I had to remove the first line (xml header) to test it. Said that, here you have the example with BeautifulSoup:

from urllib.request import urlopen
from bs4 import BeautifulSoup
from itertools import dropwhile
import re

html = urlopen('http://www.datosempresa.com/Categoria/peluqueria?pagina=4').read()
soup = BeautifulSoup(html, 'html')

for div in soup.find_all('div', attrs={'class':'resultados'}):
    title = div.find_next('h3').string
    email = list(dropwhile(lambda x: not re.match(r'(?i)email:', x), div.strings))
    print('{} - {}'.format(title, email[1] if email else 'Not found'))

It searches all <div> elements with a class attribute with resultados as value, extracts all strings from its childrens and remove all of them found before one that matches email: ignoring case. If the returning list is empty, just print Not found, otherwise the email will be the second element in the list, so extract it.

Run it like:

python3 script.py

That yields:

MANUELA RIVERO - oscarvp30@hotmail.com
SALON DE BELLEZA LIDIA - Not found
TRUKO & HAIR DESIGN - Not found
PACO PERFUMERIAS - pacoperfumerias@gmail.com
ESTHER CENDAGORTAGALARZA ESTILISTA - peluqueriaesthercendagortagalarza@hotmail.es
ADARIS - adaris@hotmail.es
N&K NAILS - info@nknails.com
PELUQUERIA NELA - wrunela@hotmail.es
PELUQUERIA NELA - wrunela@hotmail.es
PELUQUERIA HUMBERTO STAR - humbertostar@yahoo.es
COLLADOS PELUQUEROS - contacta@colladospeluqueros.com
ZEN NATURE ESTéTICA - contacta@colladospeluqueros.com
LA CASA DE MAR - Not found
DELGADO PERRUQUERS - Not found
(...output cut to save space...)