Question

I am writing a java program to write data to a csv file which fetches a key's count value from database and writes the count corresponding to each key in the file. I have accomplished it using FileWriter whose pseudocode goes like below

while (keys.hasNext()) {
    writer.append(keys.next().getCount());
    writer.append(',');
}

// where keys is the list of the keys

The headers are also appended in the above way. Now I have come across open source libraries such as OpenCSV and CommonsCSV for writing to csv files.

So now am wondering whether using the libraries is better or using the above mentioned way of writing to CSV file. Can someone please tell me which way is better in terms of readability and efficiency?

Was it helpful?

Solution 2

It's pretty much up to you. Here's the OpenCSV equivalent of your code:

 CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), '\t');
 ...
 String[] row = new String[];
 int i=0;
 while(keys.hasNext()) {
     row[i++] = keys.next().getCount();
 }
 writer.writeNext(entries);

Is that more or less readable than yours? That's subjective and up to you. I can tell you that yours is not inefficient.

It's worth noting that your code will write "," at the end of each line. The library will not. Your code could be changed like this:

boolean more = keys.hasNext();
while (more) {
   writer.append(keys.next().getCount());
   more = keys.hasNext();
   if(more) {
      writer.append(',');
   }
}

CSV seems simple, and usually is, until you start encountering more complex situations, such as quoted fields containing commas or escaped quotes:

 A field,"another field","a field, containing a comma","A \"field\""

If your program encounters a situation like this, it will break, and you'd need to enhance your CSV algorithms to handle it. If you were using a library, you could have some reasonable expectation that it would handle quotes and quoted commas from the outset. It's up to you how likely you think that situation is.

Writing CSV code is usually simple, but there are pitfalls and it's always good to have less code to maintain.

Using a library has its own overheads -- managing the dependencies and so on.

You probably don't need a library for the simple stuff you're doing now. You might consider using one if your own code evolves to get more complicated, or if you start to need features like exporting beans to CSV, or handling CSV containing quoted commas.

OTHER TIPS

There is an engineering principle - "If it works - don't touch it".

Of course using a mature open source library will often get you a benefit in terms of code stability and flexibility. But you will spend your time to learn this library and it may lead to some refactorings in your code to adapt it nicely.

In your case what you can achieve is a greater control over field separators and encodings.

Using an open source library has few considerations:

Pros:

  • No doubt the open source library must have gone through the scrutiny of the community and hence its available as one of the most efficient options.
  • Saves a lot of boilerplate code and gives you a head-start.
  • The library is packed with more features than you need. This helps extending the application in future.
  • Generally, the open source libraries are optimized for performance. This saves effort on your side.

Cons:

  • Another dependency is added in your application.
  • A small learning curve involved in using the library. I would personally discount this as there will be ready-made code available for library usage.
  • Slightly overkill if the use case is very trivial.

CSV file format is not simply separating your column names or values with commas. If there is a comma (,) or double quote (") in your data, that needs to be escaped properly.

For example if you have two columns name and address. The values you need to write are name:aarish and address:"MyHome",Chicago, MI

Then if you write that in CSV like:

name,address
aarish,"MyHome",Chicago, MI

The values will be parsed as four different fields.

If you use the libraries, it will give you output as CSV like

"name","address"
"aarish","""MyHome"",Chicago, MI"

This file will be other parsers or opened in other editors properly.

So I would suggest you to use libraries if you have some characters like comma(,) or double quote(") in your data, if you don't have any such characters, you can go with your simpler approach.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top