Question

I'm writing some java code that's supposed to parse csv files with different column types and values. A basic file looks something like this (CSV), without the header/column row. To make things simpler when processing the file, I want to be able to access each cell's index value using the column name. I don't want to use a CSV parser at the moment.

    Column1 | Column2 | Column3 |...
    --------+---------+---------+---
    val10   | val20   | val30   |
    val11   | val21   | val31   |
    val12   | val22   | val32   |
    ...     | ...     | ...     |

I thought about using ArrayList of column names (in order), since enum doesn't convert back to integers as in C++. This way I could do something like:

    ArrayList<String> columnNames = new ArrayList<String>();
    columnNames.add("Column1");
    columnNames.add("Column2");
    columnNames.add("Column3");

    // read each line from the file ...
    String[] row = line.trim().split(",");
    String col2 = row[ columnNames.indexOf("Column2") ];

I'm fairly new to Java - is there a better / smarter way to do this? thanks.

Was it helpful?

Solution

Your code works. however two points you may want to re-think, if you are looking for "better" way:

  1. indexOf(object) method of List is not so fast. costs O(n). if you maintain a Map<columnNameString, indexNumber>, and get the index from colName, it should be faster than your current impl. Apart from that, in java, you can get different types of value from an enum. you even could let your enum implement interfaces.

  2. you should do some exception handling. what if one line in your file missing a (or more) column(s). Your current codes will throw OutOfbound exception. however I hope this was already done in your real codes.

OTHER TIPS

The easiest way to solve this is to use the collections library and create a List of Maps where the keys in the map are the column names, like this:

List<Map<String,String>> records = someCodeForReadingDataFromFile();

Where you split each line in to an array and then create a map of the values:

List<Map<String,String>> someCodeForReadingDataFromFile() {
  List<<Map<String,String>> rowsList = new LinkedList<<Map<String,String>>();
  final String[] columnNames = {"Column1", "Column2", "Column3"};

  // add some loop to read one line at the time from the file
  ...
  String[] rows = line.trim().split(",");
  Map<String, String> rowMap = new HashMap<String, String>();
  for(int columnIndex = 0; columnIndex < columnNames.length; columnIndex++) {
     rowMap.put(columnNames[columnIndex], rows[columnIndex]); 
  }
  rowsList.add(rowMap);
  // repeat this until you reach EOF
  return rowsList;
}

Then you can access all cells in the CSV file their row index and column name:

String valueOne = records.get(0).get("Column1"); // will set the value to "val10"

If the column names are fixed you can still make an enum such as this

public enum Columns {
 Column1, Column2;
}

And then use the name() method inherited from the Enum class to get the values:

String valueOne = records.get(0).get(Columns.Column1);

However, if you decide to use a library for simplifying this process, I can really recommend the Smooks library or even Apache Commons CSV (really lightweight!).

One of your assertions is inaccurate. You state that "enum doesn't convert back to integers as in C++," which is true. However, Enums in Java are actually more flexible than that! They are objects, which you can have any number of values or properties, not just a number. Consider this (untested) code:

public enum ColumnEnum {
    COL1(1),
    COL2(2),
    COL3(3);

    private final int index;
    ColumnEnum(int index) {
        this.index = index;
    }
    public double index()   { return index; }
}

Now you can refer to the parts of the array like this:

// read each line from the file ...            
String[] row = line.trim().split(",");            
String col2 = row[ ColumnEnum.COL1.index() ];    
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top