Domanda

I'm new to Python and need some help. The web hasn't been very helpful. Simply put, I have a web response that looks like this:

<html>
  <field>123</field>
  <field>456</field>
</html>

What I'm trying to do is take all of the contents from the field elements into an array that I can index. The end result would look like this:

myArray[0] = 123
myArray[1] = 456

and so on...

What I'm going to end up doing with this is running a random number generator to randomly pick one of the elements in this array and retrieve its value.

Is this possible? I can't seem to find a straight answer on the web, so I feel like I might be asking for the wrong thing.

È stato utile?

Soluzione 3

I would use XPath Xpath python to parse the values and just insert them into a list, first declare and empty list my_list=[] and then just append the values my_list.append(parsed_value).

Altri suggerimenti

If you're doing simple things like that you might want to look at the ElementTree module built into python. You don't need to install anything extra, its all included in python

import xml.etree.ElementTree as ET

filename='data.txt'
tree = ET.parse(filename)
root = tree.getroot()
myArray=[]

for x in root.findall('field'):
    myArray.append(x.text)

print(myArray)  

By far the easiest way to extract information from HTML is BeautifulSoup. Here's a snippet to get the list you want:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html_text)
>>> fields = [int(el.text) for el in soup.find_all("field")]
>>> fields
[123, 456]

Since you're new to Python:

  1. We import the BeautifulSoup class from the bs4 module (which you'll need to install - see the link above).
  2. We create a BeautifulSoup instance called soup from html_text.
  3. We create a list called fields, using a list comprehension:
    • convert the text of el into an integer
    • for each el
    • which we get by finding all field elements in soup

look at standart modules! http://docs.python.org/2/library/htmlparser.html#examples

if you need this only for the case from question, try this it replaces all tags with ' ' (space), and str.split splits resulted text by one or more spaces as delimeters

import re
def get_data(str_data):
    return re.sub(r'<.*?>',' ', str_data).split()

str_data = """<html>
  <field>123</field>
  <field>456</field>
</html>"""

print get_data(str_data) # prints "['123', '456']"

sorry for my English

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top