Is using HtmlCleaner and Jsoup together a good idea?

https://stackoverflow.com/questions/19055550

29-06-2022
|

Domanda

So I'm trying to get some data from a certain website. When the app is first started, it downloads a html file of a certain website and cleans it.

private class cleanHtml extends AsyncTask<Void, Void, Void>{

    @Override
    protected Void doInBackground(Void... arg0) {
        try {
            HtmlCleaner cleaner = new HtmlCleaner();
            String url = "https://www.easistent.com/urniki/263/razredi/16515";
            TagNode node = cleaner.clean(new URL(url));
            CleanerProperties props = cleaner.getProperties();
            String fileName = Environment.getExternalStorageDirectory().getPath() + "/Android/data/com.whizzapps.stpsurniki/cleaned.html";
            new PrettyXmlSerializer(props).writeToFile(node, fileName, "utf-8");
            Log.i("TAG", "AsyncTask done!");
        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return null;
    }
}

Now I know I could parse html using HtmlCleaner using XPath, but I have no knowledge at all in XPath. I'm pretty sure it would be easier to parse it with Jsoup after the file is cleaned. Is this okay?

Soluzione

It shouldn't be a problem all you need is a valid html. you can use this:

 String html = getHtml();
 Document doc = Jsoup.parse(html);
 Elements elms = doc.select("cssSelector");
 Elements elms1 = doc.getElementsByClass("class");

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow