Question

I've opened a .php page from a website with bunch of hyperlinks on it. I want to copy them (their URLs) into a .txt file. Of course, I could do that manually, but there are too many of them, so I would want to do it somehow automatically.

Before I would do it this way: I would look into the page source, that is, its HTML code, and then parse it with some small script written specially for that. But this one is a .php page and all the links are piped in from a database on the server, I guess, rather than from the source code. Anyway, they are not in the page's HTML code.

I wonder if that is still possible. I believe it should be possible - all the links are displayed on my screen, they are all click-able and working, there should some way of capturing them somehow.

Was it helpful?

Solution

What I understand is you want to do this from browser itself: in that case use chrome open debug panel (press F12) and got to console tab and paste following code and press enter, and then copy the list of links from console and put in txt file.

var tags = document.getElementsByTagName("a");
for(var i=0;i<tags.length;i++) {
    console.log(tags[i].getAttribute("href"));
}

OTHER TIPS

What you need to do.

Use php's CURL library to get the page as a string. Or better yet use file_get_contents

http://au1.php.net/file_get_contents

$homepage = file_get_contents('http://www.example.com/');

Use the DomDocument library to build a html document. http://au1.php.net/domdocument

$doc = new DOMDocument();
$doc->loadHTML($homepage);

From here you can get all the <a> tags in the html and get the href elements. By Calling $elements = $doc->getElementsByTagName("a");

Then just iterate over the elements getting the href out.

foreach($elements as $el) {
    $link = $el->getAttribute("href");
    echo $link . "\n";
}
//untested code

You can then re-use the script on any page, just change the curl request.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top