How to get all the links from a .php page? [closed]

https://stackoverflow.com/questions/20903377

23-09-2022
|

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Closed 8 years ago.

This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist

I've opened a .php page from a website with bunch of hyperlinks on it. I want to copy them (their URLs) into a .txt file. Of course, I could do that manually, but there are too many of them, so I would want to do it somehow automatically.

Before I would do it this way: I would look into the page source, that is, its HTML code, and then parse it with some small script written specially for that. But this one is a .php page and all the links are piped in from a database on the server, I guess, rather than from the source code. Anyway, they are not in the page's HTML code.

I wonder if that is still possible. I believe it should be possible - all the links are displayed on my screen, they are all click-able and working, there should some way of capturing them somehow.

Solution

What I understand is you want to do this from browser itself: in that case use chrome open debug panel (press F12) and got to console tab and paste following code and press enter, and then copy the list of links from console and put in txt file.

var tags = document.getElementsByTagName("a");
for(var i=0;i<tags.length;i++) {
    console.log(tags[i].getAttribute("href"));
}

OTHER TIPS

What you need to do.

Use php's CURL library to get the page as a string. Or better yet use file_get_contents

http://au1.php.net/file_get_contents

$homepage = file_get_contents('http://www.example.com/');

Use the DomDocument library to build a html document. http://au1.php.net/domdocument

$doc = new DOMDocument();
$doc->loadHTML($homepage);

From here you can get all the <a> tags in the html and get the href elements. By Calling $elements = $doc->getElementsByTagName("a");

Then just iterate over the elements getting the href out.

foreach($elements as $el) {
    $link = $el->getAttribute("href");
    echo $link . "\n";
}
//untested code

You can then re-use the script on any page, just change the curl request.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow