문제

I'm now making a web crawler.

getting a link from HTML is easy part but acquiring a link from the result of javascript is not easy for me.

Can I get the result of javascript so as to know where a link is referred to?

for example.

How can I retrieve the link to google.com from javascript code in Python?

<!DOCTYPE html>
<html lang="en">
    <head></head>
    <body>
        <a href="#" id="goog">to google</a>
    </body>
    <script>
        document.getElementById('goog').onclick = function() {
            window.location = "http://google.com";
        };

    </script>
</html>
도움이 되었습니까?

해결책

You would need to install node.js and run a separate piece of code that executes the Javascript code in context to emit the html. This is possible using jsdom but the key to it is extracting the Javascript code from the HTML page, and setting up the context correctly.

다른 팁

Python doesn't offer a way to execute the Javascript, which would be a large task, and may not even be what you want, because you won't know how to execute all of the appropriate Javascript.

For the code you showed, you could simply regex the entire thing to get URL-like strings from it, but that could be very ad-hoc and error-prone.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top