Clob to String: BufferedReader vs getSubString()?

https://stackoverflow.com/questions/22874722

28-06-2023
|

Question

I am writing a Java program to read clob columns from a database containing XML data and do some processing.

I need to use Regex on this XML content, so converting to string is a useful approach for me, and I am sure the length of the clob will not exceed the capacity of a String for the operational purposes. As of now, the shortest and longest CLOBs from approx. 4000 total entries in the DB are 2000 chars and 18000 chars respectively.

My question is: With this sort of a max length (say 20k chars), which of the 2 following approaches is recommended and why?

Approach 1: Method getSubString() on Clob, simpler to use:

// cXML is the Clob object   
String sXML = cXML.getSubString(1, (int)cXML.length());

Approach 2: Use BufferedReader and StringBuilder in a method (Better performance?):

private String ClobToString(Clob data)
throws Exception
{
    StringBuilder sb = new StringBuilder();
    StringBuilder exDtl = new StringBuilder();
    try {
        Reader reader = data.getCharacterStream();
        BufferedReader br = new BufferedReader(reader);
        String line;
        while(null != (line = br.readLine())) {
            sb.append(line + "\n");
        }
        br.close();
    } catch (SQLException e) {
        exDtl.append(e.getMessage());
    } catch (IOException e) {
        exDtl.append(e.getMessage());
    }
    finally
    {
        if (exDtl.length() > 0)
        {
            throw new Exception(exDtl.toString());
        }
        return sb.toString();
    }
} // End Method ClobToString

I have read in some websites/forums that Approach 2 is better in terms of performance, so I am trying to understand:

At what threshold of Clob length does it become advisable to use Approach 2 instead of Approach 1?
Are neither of these approaches recommended, and if so what would be a third approach?
(I was looking to avoid 3rd party libraries like StreamFlyer)

Solution

The implementation of approach 2 is flawed and since the data is not processed in a streaming fashion anyway, use approach 1. Approach 1 will call the driver's implementation of approach 2 which is probably more accurate and could look something like this:

public static String getString(Clob data) throws SQLException {

    StringBuilder sb = new StringBuilder();
    char[] cbuf = new char[8192];
    int l = 0;
    try (Reader r = data.getCharacterStream()) {
        while ((l = r.read(cbuf)) > -1) {
            sb.append(cbuf, 0, l);
        }
    } catch (IOException ioe) {
        throw new SQLException("Unable to read character stream from Clob.", ioe);
    }
    return sb.toString();
}

I don't think it will make any difference in performance.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow