我如何从web视图网页内容？

https://stackoverflow.com/questions/2376471

24-09-2019
|

题

在机器人，我有一个显示页面一个WebView。

我如何获得页面的源代码，而无需再次请求页面？

看来WebView应该有某种getPageSource()方法返回一个字符串，但可惜事实并非如此。

如果启用JavaScript，什么是适当的JavaScript来把这个调用来获取内容？

webview.loadUrl("javascript:(function() { " +  
    "document.getElementsByTagName('body')[0].style.color = 'red'; " +  
    "})()");

解决方案

我知道这是一个迟到的回答，但我发现这个问题，因为我有同样的问题。我想我找到了答案上lexandera这个帖子。 COM。下面的代码基本上是从网站剪切和粘贴。似乎这样的伎俩。

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface
{
    @JavascriptInterface
    @SuppressWarnings("unused")
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */
browser.getSettings().setJavaScriptEnabled(true);

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    @Override
    public void onPageFinished(WebView view, String url)
    {
        /* This call inject JavaScript into the page which just finished loading. */
        browser.loadUrl("javascript:window.HTMLOUT.processHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
    }
});

/* load a web page */
browser.loadUrl("http://lexandera.com/files/jsexamples/gethtml.html");

其他提示

12987，布伦德尔的回答崩溃（至少在我的2.3 VM）。相反，我截距调用将特殊前缀CONSOLE.LOG：

// intercept calls to console.log
web.setWebChromeClient(new WebChromeClient() {
    public boolean onConsoleMessage(ConsoleMessage cmsg)
    {
        // check secret prefix
        if (cmsg.message().startsWith("MAGIC"))
        {
            String msg = cmsg.message().substring(5); // strip off prefix

            /* process HTML */

            return true;
        }

        return false;
    }
});

// inject the JavaScript on page load
web.setWebViewClient(new WebViewClient() {
    public void onPageFinished(WebView view, String address)
    {
        // have the page spill its guts, with a secret prefix
        view.loadUrl("javascript:console.log('MAGIC'+document.getElementsByTagName('html')[0].innerHTML);");
    }
});

web.loadUrl("http://www.google.com");

这是基于 jluckyiv的一个答案，但我认为这是更好，更易于更改JavaScript如下：

browser.loadUrl("javascript:HTMLOUT.processHTML(document.documentElement.outerHTML);");

你有没有考虑单独获取的HTML，然后加载到网页视图？

String fetchContent(WebView view, String url) throws IOException {
    HttpClient httpClient = new DefaultHttpClient();
    HttpGet get = new HttpGet(url);
    HttpResponse response = httpClient.execute(get);
    StatusLine statusLine = response.getStatusLine();
    int statusCode = statusLine.getStatusCode();
    HttpEntity entity = response.getEntity();
    String html = EntityUtils.toString(entity); // assume html for simplicity
    view.loadDataWithBaseURL(url, html, "text/html", "utf-8", url); // todo: get mime, charset from entity
    if (statusCode != 200) {
        // handle fail
    }
    return html;
}

我设法得到这个工作使用来自@ jluckyiv的答案代码，但我不得不在@JavascriptInterface添加注释在MyJavaScriptInterface的processHTML方法。

class MyJavaScriptInterface
{
    @SuppressWarnings("unused")
    @JavascriptInterface
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

您还需要注释的方法与@JavascriptInterface如果您targetSdkVersion为> = 17 - 因为在SDK 17个新的安全要求，即所有的JavaScript方法必须与@JavascriptInterface进行注释。否则就会看到错误等：未捕获类型错误：对象[对象的对象]具有零没有方法“processHTML”：1

如果您正在使用（KitKat）以上，就可以使用Chrome远程调试工具来找到所有进出你的WebView的，也是网页的HTML源代码查看请求和响应。

https://developer.chrome.com/devtools/docs/remote-debugging

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow