Question

I want to Create A Simple Web Crawler in Java. I am trying to use this code

WebDriver driver = new HtmlUnitDriver();
driver.get("https://codereview.qt-project.org/#change,70");
String pageSource=driver.getPageSource();
System.out.println(pageSource);

So I got this source code >>

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<html><head><META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Gerrit Code Review</title><meta content="locale=en_US" name="gwt:property">
<script language="javascript" type="text/javascript">var gerrit_hostpagedata={"config":
{"useContributorAgreements":true,"useContactInfo":false,"allowRegisterNewEmail":false, 

But the content is produced with JavaScript, I want to obtain the HTML snapshot.

Was it helpful?

Solution

Create a Javascript enabled driver..

WebDriver driver = new HtmlUnitDriver(true);

Results:

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <title>
      codereview.qt-project Code Review
    </title>
    <meta content="locale=en_US" name="gwt:property"/>
    <script language="javascript" type="text/javascript">
//<![CDATA[
var gerrit_hostpagedata={"config":{"useContributorAgreements":true,"useContactInfo":false,"allowRegisterNewEmail":false,"authType":"HTTP","downloadSchemes":["DEFAULT_DOWNLOADS"],"sshdAddress":"*:29418","wildProject":{"name":"All-Projects"},"approvalTypes":{"approvalTypes":[{"category":{"categoryId":{"id":"CRVW"},"name":"Code Review","abbreviatedName":"R","position":1,"functionName":"MaxWithBlock","copyMinScore":true,"labelName":"Code-Review"},"values":[{"key":{"categoryId":{"id":"CRVW"},"value":-2},"name":"This shall not be merged"},{"key":{"categoryId":{"id":"CRVW"},"value":-1},"name":"I would prefer this is not merged as is"},{"key":{"categoryId":{"id":"CRVW"},"value":0},"name":"No score"},{"key":{"categoryId":{"id":"CRVW"},"value":1},"name":"Looks good to me, but someone else must approve"},{"key":{"categoryId":{"id":"CRVW"},"value":2},"name":"Looks good to me, approved"}],"maxNegative":-2,"maxPositive":2},{"category":{"categoryId":{"id":"SRVW"},"name":"Sanity Review","abbreviatedName":"S","position":2,"functionName":"MaxWithBlock","copyMinScore":false,"labelName":"Sanity-Review"},"values":[{"key":{"categoryId":{"id":"SRVW"},"value":-2},"name":"Major sanity problems found"},{"key":{"categoryId":{"id":"SRVW"},"value":-1},"name":"Sanity problems found"},{"key":{"categoryId":{"id":"SRVW"},"value":0},"name":"No sanity review "},{"key":{"categoryId":{"id":"SRVW"},"value":1},"name":"Sanity review passed"}],"maxNegative":-2,"maxPositive":1}]},"editableAccountFields":["REGISTER_NEW_EMAIL","USER_NAME","FULL_NAME"],"commentLinks":[{"find":"[Tt]ask-number:\\s+([\\w\\-]+)","replace":"\u003ca href\u003d\"http://bugreports.qt-project.org/browse/$1\"\u003e$\u0026\u003c/a\u003e"}],"documentationAvailable":false}};gerrit_hostpagedata.theme={"backgroundColor":"#FCFEEF","topMenuColor":"#44A51C","textColor":"#000000","trimColor":"#B6DCA6","selectionColor":"#FFFFCC"};
//]]>
    </script>
    <style type="text/css">

#gerrit_topmenu {
    color: #ffffff;
}

#gerrit_topmenu .gwt-Label {
    color: #ffffff;
}

#gerrit_topmenu .gwt-TabBarItem-selected .gwt-Label {
    color: #000000;
}

#gerrit_topmenu a, #gerrit_topmenu a:visited, #gerrit_topmenu a:hover {
    color: #ffffff;
}

#qt-footer-links {
    background-color: #44A51C;
}

#qt-footer-links ul {
    width: 100%;
    margin: 0;
    text-align: center;
    padding: .1em 0 .3em 0;
}

#qt-footer-links li {
    display: inline;
    padding: .1em 1em;
}

#qt-footer-links a, #qt-footer-links a:visited, #qt-footer-links a:hover {
    font-family: Arial;
    color: white;
    font-size: 11px;
    font-weight: bold;
    text-decoration: none;
}



    </style>
    <link href="favicon.ico" rel="icon" type="image/gif"/>
    <link href="gerrit/gwt/chrome/30B802F72484AED7E67C91FE77CD50BD.cache.css" rel="stylesheet"/>
    <link href="undefined" rel="stylesheet"/>
  </head>
  <body>
    <div id="gerrit_topmenu" class="GCLMTUVDNF">
      <table class="GCLMTUVDIK">
        <colgroup>
          <col/>
          <col/>
          <col/>
        </colgroup>
        <tbody>
          <tr>
            <td class="GCLMTUVDMK">
              <table cellspacing="0" cellpadding="0" class="GCLMTUVDJK">
                <tbody>
                  <tr>
                    <td align="left" style="vertical-align: top;">
                      <table cellspacing="0" cellpadding="0" class="gwt-TabBar" role="tablist" style="width: 100%;">
                        <tbody>
                          <tr>
                            <td align="left" style="vertical-align: bottom;" height="100%" class="gwt-TabBarFirst-wrapper">
                              <div class="gwt-TabBarFirst" style="white-space: normal; height: 100%;">
                                 
                              </div>
                            </td>
                            <td align="left" style="vertical-align: bottom;" class="gwt-TabBarItem-wrapper gwt-TabBarItem-wrapper-selected">
                              <div tabindex="0" class="gwt-TabBarItem gwt-TabBarItem-selected" role="tab">
                                <div class="gwt-Label" style="white-space: nowrap;">
                                  All
                                </div>
                              </div>
                            </td>
                            <td align="left" style="vertical-align: bottom;" width="100%" class="gwt-TabBarRest-wrapper">
                              <div class="gwt-TabBarRest" style="white-space: normal; height: 100%;">
                                 
                              </div>
                            </td>
                          </tr>
                        </tbody>
                      </table>
                    </td>
                  </tr>
                  <tr>
                    <td align="left" style="vertical-align: top;" height="100%">
                      <div class="gwt-TabPanelBottom" role="tabpanel">
                        <div style="width: 100%; height: 100%; padding: 0px; margin: 0px;">
                          <div class="GCLMTUVDMG" role="menubar" style="width: 100%; height: 100%;">
                            <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:open,n,z" role="menuitem">
                              Open
                            </a>
                            <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:staged,n,z" role="menuitem">
                              Staged
                            </a>
                            <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:integrating,n,z" role="menuitem">
                              Integrating
                            </a>
                            <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:merged,n,z" role="menuitem">
                              Merged
                            </a>
                            <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:deferred,n,z" role="menuitem">
                              Deferred
                            </a>
                            <a class="GCLMTUVDPG" href="#q,status:abandoned,n,z" role="menuitem">
                              Abandoned
                            </a>
                          </div>
                        </div>
                      </div>
                    </td>
                  </tr>
                </tbody>
              </table>
            </td>
            <td class="GCLMTUVDLK">
              <div>
              </div>
            </td>
            <td class="GCLMTUVDMK">
              <div class="GCLMTUVDKK">
                <div class="GCLMTUVDMG" role="menubar">
                  <a class="GCLMTUVDPG" href="javascript:;" role="menuitem">
                    Sign In
                  </a>
                </div>
                <div class="GCLMTUVDJJ">
                  <input type="text" class="gwt-TextBox GCLMTUVDHG" value="Change #, SHA-1, tr:id, owner:email or reviewer:email"/>
                  <button type="button" class="gwt-Button">
                    Search
                  </button>
                </div>
              </div>
            </td>
          </tr>
        </tbody>
      </table>
      <div class="GCLMTUVDGJ">
        <span class="GCLMTUVDEJ GCLMTUVDFJ" style="">
          Loading ...
        </span>
      </div>
    </div>
    <div id="gerrit_header">
      <div>
        <img src="static/logo_open_gov.png" style="margin: 18px 0 0 10px;"/>
        <img src="static/logo_qt.png" style="float: right; margin: 18px 28px 0 0;"/>
      </div>
    </div>
    <div id="gerrit_body" class="GCLMTUVDMF">
      <div>
        <div style="display: none;">
          <div class="GCLMTUVDHJ GCLMTUVDLB">
            <div class="GCLMTUVDIJ">
              <span class="gwt-InlineLabel">
              </span>
            </div>
            <div>
              <table cellspacing="0" cellpadding="0">
                <tbody>
                  <tr>
                    <td align="left" style="vertical-align: top;">
                      <table class="GCLMTUVDFG GCLMTUVDKB">
                        <colgroup>
                          <col/>
                          <col/>
                        </colgroup>
                        <tbody>
                          <tr>
                            <td class="header GCLMTUVDNK">
                              Change-Id: 
                            </td>
                            <td class="GCLMTUVDNK GCLMTUVDBC">
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="header">
                              Owner
                            </td>
                            <td>
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="header">
                              Project
                            </td>
                            <td>
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="header">
                              Branch
                            </td>
                            <td>
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="header">
                              Topic
                            </td>
                            <td>
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="header">
                              Uploaded
                            </td>
                            <td>
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="header">
                              Updated
                            </td>
                            <td>
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="header GCLMTUVDDB">
                              Status
                            </td>
                            <td>
                               
                            </td>
                          </tr>
                          <tr>
                            <td class="GCLMTUVDHI">
                               
                            </td>
                            <td class="GCLMTUVDHI">
                               
                            </td>
                          </tr>
                        </tbody>
                      </table>
                    </td>
                    <td align="left" style="vertical-align: top;">
                      <div class="GCLMTUVDMB">
                      </div>
                    </td>
                  </tr>
                </tbody>
              </table>
              <div class="GCLMTUVDO">
                <table class="GCLMTUVDGG">
                  <colgroup>
                    <col/>
                    <col/>
                    <col/>
                    <col/>
                    <col/>
                  </colgroup>
                  <tbody>
                    <tr>
                      <td class="header">
                        Reviewer
                      </td>
                      <td class="header">
                         
                      </td>
                      <td class="header">
                        Code Review
                      </td>
                      <td class="header">
                        Sanity Review
                      </td>
                      <td class="header GCLMTUVDDJ">
                         
                      </td>
                    </tr>
                  </tbody>
                </table>
                <ul class="GCLMTUVDCH">
                </ul>
                <div class="GCLMTUVDK" style="display: none;">
                  <div>
                    <input type="text" class="gwt-SuggestBox GCLMTUVDHG" value="Name or Email"/>
                    <button type="button" class="gwt-Button">
                      Add Reviewer
                    </button>
                  </div>
                </div>
              </div>
              <table cellspacing="0" cellpadding="0" class="gwt-DisclosurePanel gwt-DisclosurePanel-closed">
                <tbody>
                  <tr>
                    <td align="left" style="vertical-align: top;">
                      <a href="javascript:void(0);" style="display: block;" class="header">
                        <table>
                          <tbody>
                            <tr>
                              <td align="center" style="width: 16px;">
                                <img onload="this.__gwtLastUnhandledEvent=&quot;load&quot;;" src="https://codereview.qt-project.org/gerrit/clear.cache.gif" style="width: 16px; height: 16px; background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAfklEQVR42mNgoDZITk4WosiAtLS0M6mpqb1Amp9cAy4B8X8gfpWenp5MiQEwfB6IbSgxAIaXArEcJQaA8Ddg+NQVFhZykmsADG8MDQ1lJseA5wQDFocBP0FRm5WVxUNOGGwEJi4VcmLhKtC5HuSkg8NA5+bjDCRCAG8UDUoAAIw8kVdwMG+3AAAAAElFTkSuQmCC) no-repeat 0px 0px" border="0" class="gwt-Image"/>
                              </td>
                              <td>
                                Included in
                              </td>
                            </tr>
                          </tbody>
                        </table>
                      </a>
                    </td>
                  </tr>
                  <tr>
                    <td align="left" style="vertical-align: top;">
                      <div style="padding: 0px; overflow: hidden; display: none;">
                        <table class="content">
                          <colgroup>
                            <col/>
                          </colgroup>
                          <tbody>
                            <tr>
                              <td>
                                 
                              </td>
                            </tr>
                          </tbody>
                        </table>
                      </div>
                    </td>
                  </tr>
                </tbody>
              </table>
              <table cellspacing="0" cellpadding="0" class="gwt-DisclosurePanel gwt-DisclosurePanel-closed">
                <tbody>
                  <tr>
                    <td align="left" style="vertical-align: top;">
                      <a href="javascript:void(0);" style="display: block;" class="header">
                        <table>
                          <tbody>
                            <tr>
                              <td align="center" style="width: 16px;">
                                <img onload="this.__gwtLastUnhandledEvent=&quot;load&quot;;" src="https://codereview.qt-project.org/gerrit/clear.cache.gif" style="width: 16px; height: 16px; background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAfklEQVR42mNgoDZITk4WosiAtLS0M6mpqb1Amp9cAy4B8X8gfpWenp5MiQEwfB6IbSgxAIaXArEcJQaA8Ddg+NQVFhZykmsADG8MDQ1lJseA5wQDFocBP0FRm5WVxUNOGGwEJi4VcmLhKtC5HuSkg8NA5+bjDCRCAG8UDUoAAIw8kVdwMG+3AAAAAElFTkSuQmCC) no-repeat 0px 0px" border="0" class="gwt-Image"/>
                              </td>
                              <td>
                                Dependencies
                              </td>
                            </tr>
                          </tbody>
                        </table>
                      </a>
                    </td>
                  </tr>
                  <tr>
                    <td align="left" style="vertical-align: top;">
                      <div style="padding: 0px; overflow: hidden; display: none;">
                        <table class="GCLMTUVDOB content" style="width: auto;">
                          <colgroup>
                            <col/>
                          </colgroup>
                          <tbody>
                            <tr>
                              <td class="GCLMTUVDDG"/>
                              <td class="GCLMTUVDDG"/>
                              <td class="GCLMTUVDFB GCLMTUVDKD">
                                ID
                              </td>
                              <td class="GCLMTUVDKD">
                                Subject
                              </td>
                              <td class="GCLMTUVDKD">
                                Owner
                              </td>
                              <td class="GCLMTUVDKD">
                                Project
                              </td>
                              <td class="GCLMTUVDKD">
                                Branch
                              </td>
                              <td class="GCLMTUVDKD">
                                Updated
                              </td>
                            </tr>
                            <tr>
                              <td colspan="8" class="GCLMTUVDKJ">
                                Depends On
                              </td>
                            </tr>
                            <tr>
                              <td colspan="8" class="GCLMTUVDOE">
                                (None)
                              </td>
                            </tr>
                            <tr>
                              <td colspan="8" class="GCLMTUVDKJ">
                                Needed By
                              </td>
                            </tr>
                            <tr>
                              <td colspan="8" class="GCLMTUVDOE">
                                (None)
                              </td>
                            </tr>
                          </tbody>
                        </table>
                      </div>
                    </td>
                  </tr>
                </tbody>
              </table>
              <table class="GCLMTUVDLJ">
                <colgroup>
                  <col/>
                  <col/>
                </colgroup>
                <tbody>
                  <tr>
                    <td>
                      Old Version History:
                    </td>
                    <td>
                      <select class="gwt-ListBox">
                        <option value="Base" selected="selected">
                          Base
                        </option>
                      </select>
                    </td>
                  </tr>
                </tbody>
              </table>
              <div>
              </div>
              <div class="GCLMTUVDJB">
              </div>
            </div>
          </div>
        </div>
      </div>
    </div>
    <div style="clear: both; margin-top: 15px; padding-top: 2px; margin-bottom: 15px;">
      <div id="gerrit_footer">
        <div>
          <div id="qt-footer-links">
            <ul>
              <li>
                <a href="http://qt.digia.com/">
                  qt.digia.com
                </a>
              </li>
              <li>
                <a href="http://qt-project.org/doc/">
                  Qt Documentation
                </a>
              </li>
              <li>
                <a href="http://qt-project.org/">
                  Qt-Project
                </a>
              </li>
              <li>
                <a href="http://planet.qt-project.org/">
                  Planet Qt
                </a>
              </li>
              <li>
                <a href="http://qt.gitorious.org/">
                  Qt Repositories - Gitorious
                </a>
              </li>
              <li>
                <a href="http://bugreports.qt-project.org/">
                  Qt Bug Tracker - JIRA
                </a>
              </li>
            </ul>
          </div>
        </div>
      </div>
      <div id="gerrit_btmmenu" style="clear: both;">
        <div class="GCLMTUVDIG">
          Press '?' to view keyboard shortcuts
        </div>
        <div class="GCLMTUVDAL">
          Powered by 
          <a href="http://code.google.com/p/gerrit/" target="_blank">
            Gerrit Code Review
          </a>
           (V2.2.1-NQT-012) | 
          <a href="http://code.google.com/p/gerrit/issues/list" target="_blank">
            Report Bug
          </a>
        </div>
      </div>
    </div>
    <iframe id="__gwt_historyFrame" src="javascript:''" style="position:absolute;width:0;height:0;border:0" tabindex="-1">
    </iframe>
    <script language="javascript" type="text/javascript">
//<![CDATA[
<!--
function gerrit(){var s,l,t,w=window,d=document,n='gerrit',f=d.createElement('iframe');function m(){if(s&&l){var b,i=d.createElement('img');i.src=n+'/clear.cache.gif';b=i.src;b=b.substring(0,b.lastIndexOf('/')+1);gerrit=null;f.contentWindow.gwtOnLoad(undefined,n,b);}}gerrit.onScriptLoad=function(){s=1;m();};gerrit.r=function(){l=1;m();};f.src="javascript:''";f.id=n;f.style.cssText='position:absolute;width:0;height:0;border:none';f.tabIndex=-1;d.body.appendChild(f);f.contentWindow.location.replace(n+'/7209E38C5F54FA2918411884E5DCDFEC.cache.html');d.write('<script defer="defer">gerrit.r()</'+'script>');}gerrit();
//-->
//]]>
    </script>
    <iframe src="javascript:''" id="gerrit" style="position:absolute;width:0;height:0;border:none" tabindex="-1">
    </iframe>
    <script defer="defer">
//<![CDATA[
gerrit.r()
//]]>
    </script>
  </body>
</html>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top