JSoup: Extracting one word from within a class tag

https://stackoverflow.com/questions/8371548

27-10-2019
|

문제

I've been using JSoup for the last few weeks to successfully scrape data from a web page; however, I've come to a dead end in trying to figure out a way to extract just a single word from within a class tag, instead of the whole text.

Here is the Java code I'm using:

// store all the search results in the elmAllSearchResults element
Element elmAllSearchResults = doc.getElementById("SearchResults"); 
// extract the detDesc class from elmAllSearchResults
Elements elmSize = elmAllSearchResults.getElementsByClass("desc");

To extract multiple lines similar to this:

<font class="desc">Date 11-04; 09:21, Size 8100.00 MB, User <a class="desc" href="/member/aUser/" title="Browse">
<font class="desc">Date 12-04; 09:21, Size 62 MB, User <a class="desc" href="/member/bUser/" title="Browse">

But now all I want to be able to do is extract the size (8100.00 MB, and 62 MB in this case) from this string of text. As the size is not easily identifiable by being wrapped in any tags I can't seem to find a way to get it.

Is it possible?

Thank You.

해결책

Jsoup goes only as far until it reaches individual HTML elements. If you want to parse their textual bodies, which are essentially Strings, then you'd need to grab String based methods instead such as substring(), indexOf(), replaceAll(), etc.

For example, if you can guarantee that the desired information is always between ", Size " and ", User", then you should substring the String on that:

String before = ", Size ";
String after = ", User";

for (Element element : elements) {
    String text = element.text();
    String size = text.substring(text.indexOf(before) + before.length(), text.indexOf(after));
    // ...
}

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow