سؤال

I'm now making a web crawler.

getting a link from HTML is easy part but acquiring a link from the result of javascript is not easy for me.

Can I get the result of javascript so as to know where a link is referred to?

for example.

How can I retrieve the link to google.com from javascript code in Python?

<!DOCTYPE html>
<html lang="en">
    <head></head>
    <body>
        <a href="#" id="goog">to google</a>
    </body>
    <script>
        document.getElementById('goog').onclick = function() {
            window.location = "http://google.com";
        };

    </script>
</html>
هل كانت مفيدة؟

المحلول

You would need to install node.js and run a separate piece of code that executes the Javascript code in context to emit the html. This is possible using jsdom but the key to it is extracting the Javascript code from the HTML page, and setting up the context correctly.

نصائح أخرى

Python doesn't offer a way to execute the Javascript, which would be a large task, and may not even be what you want, because you won't know how to execute all of the appropriate Javascript.

For the code you showed, you could simply regex the entire thing to get URL-like strings from it, but that could be very ad-hoc and error-prone.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top