Question

I have a file 'records.txt' which contains over 200,000 records.

Each record is on a separate line and has multiple fields separated by a delimiter '|'.

Each row should have 35 fields, but the problem is one of these rows has <>35 fields, i.e. <>35 '|' characters.

Can someone please suggest a way in Unix, by which I can identify the row. (Like getting count of '|' characters in each row in the file)

Was it helpful?

Solution

Try this:

awk -F '|'  'NF != 35 {print NR, $0} ' your_filefile

OTHER TIPS

This small perl script should do it:

cat records.txt | perl -ne '$t = $_; $t =~ s/[^\|]//g; print unless length($t) == 35;'

This works by removing all the characters except the |, then counting what is left.

Greg's way with bash stuff, for the bash friends out there :)

while read n; do [ `echo $n | tr -cd '|' | wc -c` != 35 ] && echo $n; done < records.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top