Apparently there are two basic requirements:
- Append a column to an existing
CSV
file - Allow concurrent operation
To achieve Requirement #1, the original file has to be read and rewritten as a new file, including the new column, irrespective of its location (i.e., in a StringBuffer
or elsewhere).
The best (and only generic) way of reading a CSV
file would be via a mature and field-proven library, such as OpenCSV, which is lightweight and commercially-friendly, given its Apache 2.0 license
. Otherwise, one has to either do many simplifications (e.g., always assume single-line CSV
records), or re-invent the wheel by implementing a new CSV
parser.
In either case, a simple algorithm is needed, e.g.:
- Initialize a
CSV
reader or parser object from the library used (or from whatever custom solution is used), supplying the existingCSV
file and the necessary parameters (e.g., field separator). - Read the input file record-by-record, via the reader or parser, as a
String[]
orList<String>
structure. - Manipulate the structure returned for every record to add or delete any extra fields (columns), in memory.
- Add blank fields (i.e., just extra separators, 1 per field), if desired or needed.
- Use a
CSV
writer from the library (or manually implement a writer) to write the new record to the output file. - Append a newline character at the end of each record written to the output file.
- Repeat for all the records in the original
CSV
file.
This approach is also scalable, as it does not require any significant in-memory processing.
For Requirement #2, there are many ways of supporting concurrency and in this scenario it is more efficient to do it in a tailored manner (i.e., "manually" in the application), as opposed to relying on a thread-safe data structure like StringBuffer
.