Question

First i wanna thank everyone in advance, i realize this is likely to be a fairly basic question, however after many hours of no results ive decided to reach out and ask for some help.

I am working on a small script that will eventually be part of a much larger, multi part program(hopefully lol). Basically it queries hidemyass.com for a proxy list (based on user input) and then saves said list to a temp file to be pinged and verified in the next step. Seems simple enough right?

Now my issue...

When i make my request and view the html responses source, the proxy ip addresses have been split all wierd.

eg.

    <br>
    .QJZ-{display:none}<br>
    .dA6C-{display:inline}<br>
    .h0UB-{display:none}<br>
    .HOns-{display:inline}<br>
    <br>        
    </style><div style="display:none">1</div><span></span><span style="display:none">99</span><span class="QJZ-">99</span><div style="display:none">99</div>201<span style="display: inline">.</span><span class="QJZ-">9</span><div style="display:none">9</div><div style="display:none">10</div><span style="display:none">80</span><span class="QJZ-">80</span><span style="display:none">140</span><span class="QJZ-">140</span><span style="display:none">149</span><span class="h0UB">149</span><div style="display:none">149</div><span style="display:none">161</span><span class="h0UB">161</span><span></span><span style="display:none">190</span>210<div style="display:none">217</div><span class="h0UB">234</span><span class="243">.</span><span class="h0UB">6

My question is, how the hell can i get my code to read that as an ip address?? (its not the full html, i cut off so as to shorten my question since its huge already)

Thanks again,
L8nit3tr0ubl3

EDIT- forgot to mention im working with python, and have very little javascript/html experience (im assuming the split is done with java)

Was it helpful?

Solution

They are specifically trying to prevent you from doing this.
(as I described on my blog)

You could ask them for an API, or you could try to use a CSS engine to figure out which elements will be displayed.

OTHER TIPS

#!/usr/bin/python
#-*- encoding: Utf-8 -*-
from requests import get
from re import sub
from sys import stdout

html = get('http://www.hidemyass.com/proxy-list/').content
html = html.split('<table id="listtable"')[1].split('</table')[0]
html = html.split('<tr')[2:]

checkClass = lambda x: x.group(2) if x.group(1) not in classesBad else ''

for tr in html:
    css = tr.split('<style>\n')[1].split('\n<')[0].split('\n')

    classesBad = [rule[1:5] for rule in css if 'display:none' in rule]

    ip = tr.split('</style>')[1].split('</span></td>')[0]
    ip = sub('<(?:span|div) style="display:none">.+?</(?:span|div)>', '', ip)
    ip = sub('<span style="display: inline">(.+?)</span>', r'\1', ip)
    ip = sub('<span class="(.+?)">(.+?)</span>', checkClass, ip)
    ip = ip.replace('<span></span>', '')

    port = tr.split('<td>\n')[1].split('<')[0]

    protocol = tr.split(' \n             <td>')[1].split('<')[0].lower()

    print '%s://%s:%s/' % (protocol, ip, port)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top