So the real issue here is that to apply the correct cell processors, you need to know what data is in each column. With a valid CSV file (same no. of columns on each line) that's not a problem, but if you're dealing with a variable column CSV file it's tricky.
If, like the example, only 1 column is optional then you just need to count the number of columns read and use the appropriate array of cell processors. It doesn't matter where that optional column is, because it's still predictable.
If, however, more than 1 column is optional you're in trouble. For example, if middleName
and city
are optional in the following CSV file:
firstName,middleName,lastName,city
Philip,Fry,New York
That can be read as:
firstName="Philip", middleName="Fry", lastName="New York", city=null
or
firstName="Philip", middleName=null, lastName="Fry", city="New York"
It's no longer predictable. You may be able to inspect the data in the column to determine what that column should represent (e.g. a date has /
's), but that's not very robust, and even then you may even have to read a few lines in order to figure it out.