Question

Here is the form. The same exact form appears twice in the source.

<form method="POST" action="/login/?tok=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log"/>
</form>

I am getting the "action" attribute with this py code

import lxml.html
tree = lxml.html.fromstring(pagesource)
print tree.xpath('//action')
raw_input()

Since there are two forms, it prints both of the attributes

['/login/?session=sess', '/login/?session=sess']

How can I get it to print just one? I only need one, since they're the same exact form.

I also have a second question

how can I get the value of the token? I am talking about this line:

 <input type="hidden" name="ses_token" value="token"/>

I try similar code,

import lxml.html
tree = lxml.html.fromstring(pagesource)
print tree.xpath('//value')
raw_input()

However, since is more than one attribute named value, it will print out

['', 'token', 'Log In', '', 'token', 'Log In'] # or something close to that

How can I get just the token? And just one?

Is there a better way to do this?

Was it helpful?

Solution

Use find() instead of xpath(), since find() returns only the first match.

Here's an example based on the code you've provided:

import lxml.html


pagesource = """<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
<form method="POST" action="/login/?session=sess">
<input type="text" id="usern" name="username" value="" placeholder="Username"/>
<input type="password" id="passw" name="password" placeholder="Password"/>
<input type="hidden" name="ses_token" value="token"/>
<input id="login" type="submit" name="login" value="Log In"/>
</form>
"""

tree = lxml.html.fromstring(pagesource)
form = tree.find('.//form')

print "Action:", form.action
print "Token:", form.find('.//input[@name="ses_token"]').value

Prints:

Action: /login/?session=sess
Token: token

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top