Searching for expected content by means of lxml
This assumes, you already have content of the page containing the data you need. The code shows fetching it by http request, if it requires rendering within browser, see later part of my answer how to get get it.
If you want to get all values in attribute data-hiringurl
, try XPath //@data-hiringurl
from lxml import html
import requests
url = "http://wearemadeinny.com/find-a-job/"
page = requests.get(url)
tree = html.fromstring(page.text) # corrected, used to be `lxml.html.fromstring`
xp = "//@data-hiringurl"
job_urls = tree.xpath(xp)
print print job_urls
But I am not sure, if the url you have provided contain such data. I did not find it there.
Getting content of page rendered by JavaScript
If the page gets the content you are interested in rendered dynamically on the client, you need to provide the browser context and let it render there. Using selenium
can do the work:
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> url = "http://wearemadeinny.com/find-a-job/"
>>> browser.get(url)
>>> page = browser.page_source
>>> print page
Now you have in page
variable content of the page and you may proceed with lxml
as described above.
Note: I do not guarantee, you will get the expected content in the page, I only know, it comes in rendered form. But if you need to proceed by clicking on some of the elements on the page, filling in some text, pressing buttons, all that can be done by browser
instance shown above - just read doc.