Remove leading/trailing /
s from the regular expression. No need to escape /
. Specify flags=re.IGNORECASE
instead of trailing i
.
matches = re.findall('<(.*?)>([^<]+)</\\1>', data, flags=re.IGNORECASE)
print(dict(matches))
Using raw string, no need to escape \
.
matches = re.findall(r'<(.*?)>([^<]+)</\1>', data, flags=re.IGNORECASE)
print(dict(matches))
Both codes print:
{'status': 'Active', 'companyname': 'TEST', ...}
non-regex alternative: lxml
Used lxml.html
instead of lxml.etree
because data
is incomplete.
import lxml.html
print({x.tag:x.text for x in lxml.html.fromstring(data)})