What's a good library for parsing fixed-length records in Groovy?
-
09-02-2021 - |
Question
I want a library that I can give it a file and a config param of column length, name, and possibly type and from that get back a map of the columns of each row.
This isn't difficult thing to do on my own, but I would be surprised if there wasn't already a great solution. I've tried searching for one, but have had no luck.
Solution
You can always use FlatFileItemReader from Spring Batch that will return a structure like JDBC ResultSet.
But it might be overkill and make it more complex. For Groovy I find it easy to read and write code like this:
file = '''\
JOHN DOE 123
JANE ROE 456
'''
names = []
file.eachLine { names << [
first: it[0..9].trim(),
last: it[10..19].trim(),
age: it[20..22].toInteger()
]}
assert names[0].first == 'JOHN'
assert names[1].age == 456
OTHER TIPS
I don't know of anything specifically for groovy. I've done something similar with regular expressions; here's a quick and dirty parser based on this approach:
def input =
"JOHN DOE 123 \n" +
"JANE ROE 456 \n"
def fieldDefs = [firstName: 10, lastName: 10, someValue: 10]
def pattern = "^" + fieldDefs.collect { k, v -> "(.{$v})" }.join('') + "\$"
rows = []
input.eachLine { line ->
def m = line =~ pattern
if (m) {
def names = fieldDefs.keySet() as List
def values = m[0][1..-1].collect { it.trim() }
rows << [names, values].transpose().collectEntries{it}
}
}
Just tested this using the regex method and the String getAt method. getAt seems to be about 2x faster than regex over 10k
def input = "";
for(i=1;i<10000;i++)
{
input += "JOHN DOE 123 \n"
}
def fieldDefs = [firstName: 10, lastName: 10, someValue: 10]
def benchmark = { closure ->
start = System.currentTimeMillis()
closure.call()
now = System.currentTimeMillis()
now - start
}
def pattern = "^" + fieldDefs.collect { k, v -> "(.{$v})" }.join('') + "\$"
duration = benchmark {
rows = []
input.eachLine { line ->
String firstName = line.getAt(0..9).trim();
String lastName = line.getAt(10..19).trim();
String someValue = line.getAt(20..29).trim();
rows << ["firstName":firstName,"lastName":lastName,"someValue":someValue];
}
//println rows
}
println "execution of string method took ${duration} ms"
def duration = benchmark {
rows = []
input.eachLine { line ->
def m = line =~ pattern
if (m) {
def names = fieldDefs.keySet() as List
def values = m[0][1..-1].collect { it.trim() }
rows << [names, values].transpose().collectEntries{it}
}
}
//println rows
}
println "execution of regex method took ${duration} ms"
execution of string method took 245 ms execution of regex method took 505 ms