Question

I need help with AutoIt StringRegExp.

I want to scrape the proxies from a html file like this:

<td>
<center>
<a class="proxyList" href="http://whois.sc/77.79.9.229" target="_blank">77.79.9.229:80</a>
<a class="proxyList" href="http://whois.sc/77.79.9.225" target="_blank">77.79.9.225:80</a>
<a class="proxyList" href="http://whois.sc/89.202.194.17"
target="_blank">89.202.194.17:8080</a>
<a class="proxyList" href="http://whois.sc/46.20.35.78" target="_blank">46.20.35.78:8080</a>
</td>

This is my horrible autoit code (it doesent work :/ ):

ClipPut(_IEBodyReadHTML($oIE))

 $ips = ClipGet() $array = StringRegExp($ips,' <center> * </td>', 3)

 $file = FileOpen("proxies.txt", 1)

 FileWrite($file, $array)

 FileClose($file)

This is what i want:

proxies.txt:
77.79.9.229:80 
77.79.9.225:80
89.202.194.17:8080
46.20.35.78:8080

Thanks for help :)

Was it helpful?

Solution

I don't know AutoIt regex flavor, but if it's similar to PCRE, your regex means:

 <center>  : a space folowed by literal <center>
 *         : a space 0 or more times folowed by a space
</td>      : literal </td>

I'm not quite sure that is what you want ;-)

To capture the IPs and port, you should do something like:

(\d{1,3}(?:\.\d{1,3}){3}:\d{1,5})
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top