+1 for the 'choking on fruit worms' pun - I nearly choked on my coffee reading that :)
If you really can't get that CSV fixed, then you could just supply your own Tokenizer (Super CSV is very flexible like that!).
You'd normally write your own readColumns()
implementation, but it's quicker to extend the default Tokenizer and override the readLine()
method to intercept the String (and fix the unescaped quotes) before it's tokenized.
I've made an assumption here that any quotes not next to a delimiter or at the start/end of the line should be escaped. It's far from perfect, but it works for your sample input. You can implement this however you like - it was too early in the morning for me to use a regex :)
This way you don't have to modify Super CSV at all (it just plugs in), so you get all of the other features like cell processors and bean mapping as well.
package org.supercsv;
import java.io.IOException;
import java.io.Reader;
import org.supercsv.io.Tokenizer;
import org.supercsv.prefs.CsvPreference;
public class FruitWormTokenizer extends Tokenizer {
public FruitWormTokenizer(Reader reader, CsvPreference preferences) {
super(reader, preferences);
}
@Override
protected String readLine() throws IOException {
final String line = super.readLine();
if (line == null) {
return null;
}
final char quote = (char) getPreferences().getQuoteChar();
final char delimiter = (char) getPreferences().getDelimiterChar();
// escape all quotes not next to a delimiter (or start/end of line)
final StringBuilder b = new StringBuilder(line);
for (int i = b.length() - 1; i >= 0; i--) {
if (quote == b.charAt(i)) {
final boolean validCharBefore = i - 1 < 0
|| b.charAt(i - 1) == delimiter;
final boolean validCharAfter = i + 1 == b.length()
|| b.charAt(i + 1) == delimiter;
if (!(validCharBefore || validCharAfter)) {
// escape that quote!
b.insert(i, quote);
}
}
}
return b.toString();
}
}
You can just supply this Tokenizer to the constructor of your CsvReader.