Question

As input I get an array of strings from the user. I need to split these lines so that they form a table, with equal number of cells per row. The cells should contain numbers. I want to determine the best guess for a cell separator character, and present this to the user as a default value, which he can then change if the guess was bad.

I assume it is one of the following: tab, semicolon, space or comma. Comma is critical, since this is also used as the decimal point in German and other cultures. The input may contain rows such as "1.0,2.0,3.0" or "1,0;2,0;3,0"

My primitive solution so far is this:

private char getSeparator(String[] rows) {
    String firstRow = rows[0];
    char[] possibleSeparators = new char[] {'\t',';',' ',','};
    char separator = possibleSeparators[1];
    for (int i=0;i<possibleSeparators.length;i++) {
        if (firstRow.indexOf(separator) >= 0) {
            separator = possibleSeparators[i];
            break;
        }
    }
    return separator;
}

Is there a better heuristic to get the best match for a cell seperator?

Performance does not matter!

Was it helpful?

Solution 2

Iterate over every row and search to see how many items the row could be split into using each of the separators.

Use the one which had rows split most frequently into the same amount of items for each row.

But honestly, this check, while clever and interesting, is likely unnecessary. The user knows their data. I'd pick a default and let them select a different one if needed. You could even persist their selection so they needn't select the same default over and over.

OTHER TIPS

I propose a more sophisticated algorithm:

  • Read the first 10 rows
  • For each row and each possible separator, count the number of occurrences
  • Pick the separator that appears the same number of times in each row (and at least once)

I think that you should use reg Ex(regular expression) from java which will help you to achieve your goal.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top