如何取HTML在Java

https://stackoverflow.com/questions/31462

09-06-2019
|

题

没有使用任何外部图书馆，什么是最简单的方法来获取网站内容成一串?

解决方案

目前，我正在使用这样的：

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
  scanner.close();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

但不知道如果有一个更好的办法。

其他提示

这工作很适合我:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

不知道在是否有其他解决方案(s)提供的任何更有效或没有。

我刚离开这个职位在其他线, 虽然你有什么上述可能的工作。我不认为会有任何容易。Apache软件包可以通过仅仅使用 import org.apache.commons.HttpClient 在你的代码。

编辑：忘记的链接；)

虽然没有香草-哇，我会提供了一个简单的解决方案。使用常规;-)

String siteContent = new URL("http://www.google.com").text

它不库，但一个名为工具卷通常安装在大多数服务器或者你可以很容易地安装在ubuntu的

sudo apt install curl

然后获取的任何html网页和储存到您本地的文件喜欢的一个例子

curl https://www.facebook.com/ > fb.html

你会得到主页上的html。你可以运行它在浏览器。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow