Question

I'm poor in regex, but for some reason I left no choice but to use it.

I'm trying to extract a list of "Port number" and their respective "IP address" from a webpage's table. And because it is a dynamic webpage that using AJAX and PHP stuff to generate dynamic content, thus all table elements doesn't have any id or class or any unique things. I had already eliminates all /t, /r and /n using str_replace, which the whole content contains only words and spaces.

Here are the example of port and ip addr:

Port - Fa0/0, Gi1/0/2.100, Ethernet01, GigaEther-01 (contains upper and lower case, dot, dash, slash and numbers, and it shouldn't be more than 16 characters, no spaces)

IP adrr - 123.123.123.123, 1.1.12.12, 123.12.1.1 (no difference with common ip addr)

But fortunately, all "port" and "Ip address" are followed by either a port image or ip image., like

...<img border='0' src='images/port.png' width='18' heigh='18'>Fa0/0</td>... OR
...<img border='0' src='images/ip.png' width='18' heigh='18'>1.1.1.1</td>...

I believe there are no spaces between the port/IP and the img/td tag. Thus I'm able to use it as a pattern for extracting them, so I used the following patterns:

Port -

$pattern = "/<img border\='0' src='images\/port\.png' width\='18' height\='18'>([a-zA-Z0-9\/ _-]{1,15})<\/td>/";

IP addr -

$pattern = "<img border\='0' src\='images\/ip\.png' width\='18' height\='18'>\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b <\/td>/";

and followed by preg_match_all($pattern, $content, $matches); . . .

But both of them return nothing to me, then I tried the following patterns:

Port -

$pattern = "/<img border\='0' src='images\/port\.png' width\='18' height\='18'>(.*)<\/td>/";

IP addr -

$pattern = "<img border\='0' src\='images\/ip\.png' width\='18' height\='18'>(.*)<\/td>/";

...

But these pattern will return something like

<img border\='0' src='images\/port\.png' width\='18' height\='18'>Fa0/0
<\/td>....(Followed by a bunch of unwanted text and code)
......<\/td>

Because the (.*) will consider anyting between the <img....> and a </td> as a valid match

And also, I tried only specific IP address regex,$pattern = "/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/";

It return only the IP addresses to me (like 111.22.3.119), but unfortunately some of the link url in the webpage contains ip address as well which is not I want.

Then I tried $pattern = "/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}<\/td>\b/";, it returns nothing...

Appreciate any ppl who are willing to help me on this, thanks.

* Edit 1 *

I tried $pattern = "/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b<\/td>/";, it works, don't know why, but still figuring how to solve the Port regex....

Était-ce utile?

La solution

$pattern1 = '#<img[^>]+>([a-z][\w./-]{1,16})</td>#i';
$pattern2 = '#<img[^>]+>([\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3})</td>#';
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top