Extracting data from table tag

https://stackoverflow.com/questions/21985044

15-10-2022
|

Question

I want to extract the data present in td tags using jsoup.

Here in the code below "BAGALKOT" is the name of the city and "KERUDI HOSPITAL RESEARCH CENTRE" is the name of the hospital.

Similarly city names and hospital names appear in the page numerous times in table structure. I want to extract this data using jsoup.

Can anyone please help me with some java code for the same.

<h2>Karnataka Hospitals List</h2>

    <tr bgcolor="#E4E4E4" height="40">
        <td height="40" align="center" class="whiteheading"><strong>Sl. No</strong></td>
        <td align="center" class="whiteheading"><strong class="whiteheading">City</strong></td>
        <td align="center" class="whiteheading"><strong>Hospital / Nursing Home</strong></td>
        <td align="center" class="whiteheading"><strong>Address</strong></td>
        <td align="center" class="whiteheading"><strong>State</strong></td>
    </tr>
    <tr height="60">
        <td width="64" align="left" bgcolor="#FFFFFF">1</td>
        <td class="copyrights" width="119" bgcolor="#FFFFFF">BAGALKOT</td>
        <td class="copyrights" width="211" bgcolor="#FFFFFF">KERUDI HOSPITAL    &amp; RESEARCH CENTRE</td>
        <td class="copyrights" bgcolor="#FFFFFF">EXTENSION,    HOSPITAL ROAD,BAGALKOT, KARNATAKA-587101.</td>
        <td class="copyrights" width="88" bgcolor="#FFFFFF">KARNATAKA</td>
    </tr>

Solution

You can extract the data using your CSS class names or the tag names individually as well:

Elements headings = doc.select("td[class=whiteheading]");
        Elements data = doc.select("td[class=copyrights]");

        for (Element el : headings) {
            System.out.print(el.text() + "\t\t\t");
        }

        System.out.println();
        for (Element el : data) {
            System.out.print(el.text() + "\t");
        }

Gives,

Sl. No          City            Hospital / Nursing Home         Address         State           
BAGALKOT    KERUDI HOSPITAL & RESEARCH CENTRE   EXTENSION, HOSPITAL ROAD,BAGALKOT, KARNATAKA-587101.    KARNATAKA

The above code will get all the td tag values for headings and data and put them to your console. The only problem you would have with the serial number as it does not have the CSS class associated. Hence, the other option can be to select only on the basis of the tag name and later filter them out:

 Elements data = doc.select("td");

        for (Element el : data) {
            System.out.print(el.text() + "\t");
        }

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow