They are specifically trying to prevent you from doing this.
(as I described on my blog)
You could ask them for an API, or you could try to use a CSS engine to figure out which elements will be displayed.
Question
First i wanna thank everyone in advance, i realize this is likely to be a fairly basic question, however after many hours of no results ive decided to reach out and ask for some help.
I am working on a small script that will eventually be part of a much larger, multi part program(hopefully lol). Basically it queries hidemyass.com for a proxy list (based on user input) and then saves said list to a temp file to be pinged and verified in the next step.
Seems simple enough right?
Now my issue...
When i make my request and view the html responses source, the proxy ip addresses have been split all wierd.
eg.
<br>
.QJZ-{display:none}<br>
.dA6C-{display:inline}<br>
.h0UB-{display:none}<br>
.HOns-{display:inline}<br>
<br>
</style><div style="display:none">1</div><span></span><span style="display:none">99</span><span class="QJZ-">99</span><div style="display:none">99</div>201<span style="display: inline">.</span><span class="QJZ-">9</span><div style="display:none">9</div><div style="display:none">10</div><span style="display:none">80</span><span class="QJZ-">80</span><span style="display:none">140</span><span class="QJZ-">140</span><span style="display:none">149</span><span class="h0UB">149</span><div style="display:none">149</div><span style="display:none">161</span><span class="h0UB">161</span><span></span><span style="display:none">190</span>210<div style="display:none">217</div><span class="h0UB">234</span><span class="243">.</span><span class="h0UB">6
My question is, how the hell can i get my code to read that as an ip address?? (its not the full html, i cut off so as to shorten my question since its huge already)
Thanks again,
L8nit3tr0ubl3
EDIT- forgot to mention im working with python, and have very little javascript/html experience (im assuming the split is done with java)
Solution
They are specifically trying to prevent you from doing this.
(as I described on my blog)
You could ask them for an API, or you could try to use a CSS engine to figure out which elements will be displayed.
OTHER TIPS
#!/usr/bin/python
#-*- encoding: Utf-8 -*-
from requests import get
from re import sub
from sys import stdout
html = get('http://www.hidemyass.com/proxy-list/').content
html = html.split('<table id="listtable"')[1].split('</table')[0]
html = html.split('<tr')[2:]
checkClass = lambda x: x.group(2) if x.group(1) not in classesBad else ''
for tr in html:
css = tr.split('<style>\n')[1].split('\n<')[0].split('\n')
classesBad = [rule[1:5] for rule in css if 'display:none' in rule]
ip = tr.split('</style>')[1].split('</span></td>')[0]
ip = sub('<(?:span|div) style="display:none">.+?</(?:span|div)>', '', ip)
ip = sub('<span style="display: inline">(.+?)</span>', r'\1', ip)
ip = sub('<span class="(.+?)">(.+?)</span>', checkClass, ip)
ip = ip.replace('<span></span>', '')
port = tr.split('<td>\n')[1].split('<')[0]
protocol = tr.split(' \n <td>')[1].split('<')[0].lower()
print '%s://%s:%s/' % (protocol, ip, port)