Question

I have to deal with text files in a motley selection of formats. Here's an example (Columns A and B are tab delimited):

A   B
a   Name1=Val1, Name2=Val2, Name3=Val3
b   Name1=Val4, Name3=Val5
c   Name1=Val6, Name2=Val7, Name3=Val8

The files could have headers or not, have mixed delimiting schemes, have columns with name/value pairs as above etc.
I often have the ad-hoc need to extract data from such files in various ways. For example from the above data I might want the value associated with Name2 where it is present. i.e.

A   B
a   Val2
c   Val7

What tools/techniques are there for performing such manipulations as one line commands, using the above as an example but extensible to other cases?

Was it helpful?

Solution

I don't like sed too much, but it works for such things:

var="Name2";sed -n "1p;s/\([^ ]*\) .*$var=\([^ ,]*\).*/\1 \2/p" < filename

Gives you:

 A B
 a Val2
 c Val7

OTHER TIPS

You have all the basic bash shell commands, for example grep, cut, sed and awk at your disposal. You can also use Perl or Ruby for more complex things.

From what I've seen I'd start with Awk for this sort of thing and then if you need something more complex, I'd progress to Python.

I would use sed:

   # print section of file between two regular expressions (inclusive)
   sed -n '/Iowa/,/Montana/p'             # case sensitive

Since you have cygwin, I'd go with Perl. It's the easiest to learn (check out the O'Reily book: Learning Perl) and widely applicable.

I would use Perl. Write a small module (or more than one) for dealing with the different formats. You could then run perl oneliners using that library. Example for what it would look like as follows:

perl -e 'use Parser;' -e 'parser("in.input").get("Name2");'

Don't quote me on the syntax, but that's the general idea. Abstract the task at hand to allow you to think in terms of what you need to do, not how you need to do it. Ruby would be another option, it tends to have a cleaner syntax, but either language would work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top