Question

My understanding is that this is a common scenario, but Java doesn't have a baked in solution and I've been searching on and off for more than a day now. I have tried the CircularCharBuffer from the Ostermiller library, but that uses some sort of reader that constantly waits for new input, so I couldn't get readline() to detect the end of the content (it would just hang).

So could someone tell me how I could do a conversion? For what it's worth, I'm converting multiple (potentially many) PDF files to raw text using the PDFBox lib. The PDFBox API puts the content onto a Writer, after which I need to get at the content for further processing (so BufferedReader/Writer is not actually essential, but some kind of Reader/Writer). I know that this is possible using StringReader/Writer, but I'm not sure that this is efficient plus I loose the readline() method.

Was it helpful?

Solution

This is a bit like asking how to convert a pig into an elephant ... :-)

OK, there are two ways to address this problem (using the Java libraries):

  • You can capture the data written to a buffered writer so that it can then be read using a buffered reader. Basically, you do this by:

    1. using your BufferedWriter to write to a StringWriter or CharArrayWriter,

    2. closing it,

    3. extracting the resulting stuff from the SW / CAW as a String, and

    4. wrapping the String in a StringReader,

    5. wrapping the StringReader in a BufferedReader.

  • You can create a PipedReader / PipedWriter pair and wrap them with BufferedReader and BufferedWriter respectively.

The two approaches both have disadvantages:

  • The first one requires you to complete the writing before constructing the read side. That means you need space to hold the entire stream content in memory, and you can't do producer-side and consumer-side processing in parallel.

  • The second one requires you to produce and consume in separate threads ... or risk having the pipeline block permanently.


Conceptually speaking, the Ostermiller library is really an reimplementation of PipeReader / PipeWriter. (And some of the advantages of his reimplementation were mooted in Java 1.6 ... which allows you to specify the pipeline's buffer size. Mark support is interesting, but I can imagine some problems, depending on how you used it.)

You might also be able to find a PipedReader / PipedWriter replacement that uses a flexible buffer that grows and contracts as required. (At least ... this is conceptually possible.)

OTHER TIPS

The CircularCharBuffer from the Ostermiller lib has two methods getWriter() and getReader() to get a reader on the content of a writer, and vice versa. The reason the Reader was hanging at the final readLine() was because I wasn't calling close() on the writer after I had finished writing to it. So the final readLine() was waiting for new content on the writer that was never going to arrive.

The Ostermiller library can be found here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top