I would use XPath Xpath python to parse the values and just insert them into a list, first declare and empty list my_list=[]
and then just append the values my_list.append(parsed_value)
.
How do I parse XML results into an array?
Domanda
I'm new to Python and need some help. The web hasn't been very helpful. Simply put, I have a web response that looks like this:
<html>
<field>123</field>
<field>456</field>
</html>
What I'm trying to do is take all of the contents from the field elements into an array that I can index. The end result would look like this:
myArray[0] = 123
myArray[1] = 456
and so on...
What I'm going to end up doing with this is running a random number generator to randomly pick one of the elements in this array and retrieve its value.
Is this possible? I can't seem to find a straight answer on the web, so I feel like I might be asking for the wrong thing.
Soluzione 3
Altri suggerimenti
If you're doing simple things like that you might want to look at the ElementTree module built into python. You don't need to install anything extra, its all included in python
import xml.etree.ElementTree as ET
filename='data.txt'
tree = ET.parse(filename)
root = tree.getroot()
myArray=[]
for x in root.findall('field'):
myArray.append(x.text)
print(myArray)
By far the easiest way to extract information from HTML is BeautifulSoup. Here's a snippet to get the list you want:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html_text)
>>> fields = [int(el.text) for el in soup.find_all("field")]
>>> fields
[123, 456]
Since you're new to Python:
- We
import
theBeautifulSoup
class from thebs4
module (which you'll need to install - see the link above). - We create a
BeautifulSoup
instance calledsoup
fromhtml_text
. - We create a list called
fields
, using a list comprehension:- convert the
text
ofel
into anint
eger for
eachel
- which we get by finding all
field
elements insoup
- convert the
look at standart modules! http://docs.python.org/2/library/htmlparser.html#examples
if you need this only for the case from question, try this
it replaces all tags with ' '
(space), and str.split
splits resulted text by one or more spaces as delimeters
import re
def get_data(str_data):
return re.sub(r'<.*?>',' ', str_data).split()
str_data = """<html>
<field>123</field>
<field>456</field>
</html>"""
print get_data(str_data) # prints "['123', '456']"
sorry for my English