Frage

This is the html code:

<!DOCTYPE html>
<html>
<title>Instructor's Page</title>

<body>

<h1>Instructor's Page</h1>


<div class="check1">    <div id="check2">
<span id="check3" class="check4"> <strong class="check5"><link href="http://schema.org/t"/>Instructor-1 name</strong>
</span>
</div>

<div class="check1">    <div id="check2">
<span id="check3" class="check4"> <strong class="check6">Instructor-2 name</strong>
</span>

</body>
</html>

I am very new to Jsoup. How to extract Instructor's name from the given html page?

Currently, I know only printing the title.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;


public class crawl {
    public static void main(String[] args) {
        Document doc1;

        try {


            File input = new File("t.html");
            doc1 = Jsoup.parse(input, "UTF-8");
        // get page title

            String title1 = doc1.title();
            System.out.println("title : " + title1);


        } catch (IOException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
    }
}
War es hilfreich?

Lösung

Use the select-method to select those elements in the HTML page you want. It takes a pattern as an argument to what objects you want to select, such as a specific tag with a certain id or class.

//Creates a collection of Element objects for all span tags
Elements names = doc.select("span");

//Returns a collection of the first cells of each row
Elements names = doc.select("td:eq(0)");

Use this to select what you are looking for. Using a tool in your web browser that helps you identify tags in the HTML source can be helpful.

As to your original question on how to select instructor names, see below.


If the structure of the HTML always is the same and you are certain that the instructors name will be inside a span-tag, then you can simply parse the text in the

    Elements names = doc.select("span");
    for (Element e : names) {
        System.out.println("Name is: " + e.text());
    }

Will print out

Name is: Instructor-1 name
Name is: Instructor-2 name
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top