Question

I'm using Jsoupto parse clipboard value to html code but it not working for subscript and superscript. For an example:

Superscript
Hello World (HTML: <b>Hello <sup>World</sup></b>)

Subscript
Hello World (HTML: <b>Hello <sub>World</sub></b>)

Code

result = rtfToHtml(new StringReader(streamToString((InputStream) contents.getTransferData(dfRTF))));

The Resul for above example is:

< html >
  < head >
    < style >
      < !--
        p.default {
          size:3;
          family:sansserif;
          foreground:#000000;
          bold:normal;
          italic:;
        }
      -- >
    < /style >
  < /head >
  < body >
    < p class=default >
      < span style="color: #000000; font-size: 14pt; font-family: ArialMT">
        < b>Hello < /b>
      < /span>
      < span style="color: #000000; font-size: 11pt; font-family: ArialMT">
        < b>World< /b>
      < /span>
    < /p>
  < /body>
< /html>

Any idea how I can handle Superscript and Subscript using Jsoup. Any advice or references is highly appreciated.

EDIT

        Clipboard clipboard = Toolkit.getDefaultToolkit().getSystemClipboard();
        Transferable contents = clipboard.getContents(null);
        DataFlavor dfRTF = new DataFlavor("text/rtf", "Rich Formatted Text");
        DataFlavor dfTxt = DataFlavor.stringFlavor;

        boolean hasTransferableRTFText = (contents != null)
                && contents.isDataFlavorSupported(dfRTF);
        boolean hasTransferableTxtText = (contents != null)
                && contents.isDataFlavorSupported(dfTxt);
if (hasTransferableRTFText) {
            try {
result = rtfToHtml(new StringReader(streamToString((InputStream) contents.getTransferData(dfRTF))));
Document doc = Jsoup.parse(result);
}
}

EDIT

public static String rtfToHtml(Reader rtf) throws IOException { // From http://www.codeproject.com/Tips/136483/Java-How-to-convert-RTF-into-HTML
        JEditorPane p = new JEditorPane();
        p.setContentType("text/rtf");
        EditorKit kitRtf = p.getEditorKitForContentType("text/rtf");
        try {
            kitRtf.read(rtf, p.getDocument(), 0);
            kitRtf = null;
            EditorKit kitHtml = p.getEditorKitForContentType("text/html");
            Writer writer = new StringWriter();
            kitHtml.write(writer, p.getDocument(), 0, p.getDocument().getLength());
            return writer.toString();
        } catch (BadLocationException e) {
            e.printStackTrace();
        }
        return null;
    }
Was it helpful?

Solution

Your problem is not related to JSoup, but to your rtfToHtml function.

Your function does not generates the <sub> and <sup> tags you expect. JSoup cannot do anything a this step, because the expected tags are not here, so you cannot parse them.

EDIT: (and Solution)

You should skip your rtfToHTML step when not necessary. If clipboard contains the data already in HTML format, so asking for it in rtf then converting it back to HTML implies losses of format information during conversions.

You can get clipboard directly in HTML format to avoid unnecessary conversions:

DataFlavor dfHTML = new DataFlavor("text/html; charset=Unicode");
boolean hasTransferableHTMLText = (contents != null) && contents.isDataFlavorSupported(dfHTML);
if (hasTransferableHTMLText)
{
    InputStream is = (InputStream)contents.getTransferData(dfHTML);
    String htmldata = org.apache.commons.io.IOUtils.toString(is, "Unicode");  

    Document doc = Jsoup.parse(htmldata);
    System.out.println(doc.html());
    //...
}

Tested with copy-to-clipboard from Chrome and FF. Both keeps the <sub> and <sup> tags you expect.

EDIT2:

IOUtils refers to org.apache.commons.io.IOUtils

OTHER TIPS

Use selector of Jsoup to get desired value.This link will help you.

Document doc = Jsoup.Connect("some url);
Elements sub= doc.select("sub");
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top