Question

Recently I have been using alot of text files (csv) with 10-60k lines, something like this

id1,id2  
id3,id1  
id81,id13  
...

And most of the times, I need to extract this informaton in form of an array:

id1,id2,id3,id1,id81,id13

Or at times, unique elements array:

id1,id2,id3,id81

Then the result is used by my code (java) to do something.

Now, most of the times I write a java function which does the task for me, right from file reading, logic and then returning back the list of Ids.

Is there is a better and a quicker way to achieve this, maybe via command line?

Update:

If I was asked to build an app which was supposed to read a file and do something with it, I will surely write that logic in Java, but in my case I have to go through alot of text files which I get from the data warehouse, extract relevant info from it and then run it over my java based app.

Now, this is only for my experiment and evaluation of my app.

Was it helpful?

Solution

I copied your input in a file, test.csv:

$ cat test.csv 
id1,id2  
id3,id1  
id81,id13  

Now, with the 'tr' utility, you can do:

$ cat test.csv | tr '\n' ',' | tr -d ' '

and you have:

id1,id2,id3,id1,id81,id13

OTHER TIPS

Unless your Java code is doing something silly, it will be in the same speed ballpark as anything else.

There's nothing magic about command-line tools that will make them faster than your code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top