Separating text with commas out of CSV?

Question 1

All the comments on your question are good ones. There is no support for CSV built in to bash, so if you don't want to use a language like Python, Ruby, Erlang or even Perl, you have to roll your own.

Note that while awk can use commas as a field separator, it also does not properly support CSV with commas embedded in quoted fields. You can hack together a solution with a pattern, as Håkon suggested.

But you don't need to do this in awk; you can do it just as well in bash alone and avoid calling an external tool. How about something like this?

#!/bin/bash

nextfield () {
  case "$line" in
    \"*)
      value="${line%%\",*}\""
      line="${line#*\",}"
      ;;
    *)
      value="${line%%,*}"
      line="${line#*,}"
      ;;
  esac
}

# loop through the file
while read line; do

  # get the content
  nextfield; vulnid="$value"
  nextfield; ruleid="$value"
  nextfield; stigid="$value"
  nextfield; title="$value"
  nextfield; discussion="$value"
  nextfield; check="$value"
  nextfield; fix="$value"

  # format the content
  printf "########################################################\n"
  printf "# Vulnerability ID: %s\n" "$vulnid"
  printf "# Rule ID: %s\n# STIG ID: %s\n#\n" "$ruleid" "$stigid"
  printf "# Rule: %s\n" "$title"
  printf "#\n# Discussion:\n"
  fmt -w68 <<<"$discussion" | sed 's/^/#   /'
  printf "# Check:\n"
  fmt -w68 <<<"$check" | sed 's/^/#   /'
  printf "# Fix:\n"
  fmt -w68 <<<"$fix" | sed 's/^/#   /'
  printf "########################################################\n"
  printf "# Start Check\n\n"
  printf "# Start Remediation\n\n"
  printf "########################################################\n"

done < STIG.csv

The speed advantage will be tremendous, if you're doing a lot of these.

Note the improved formatting, courtesy of fmt. This kind of kills the speed benefit of avoiding the call to an external program, but it sure makes your output easier to read. :)

Question 2

In Gnu Awk version 4, you could try:

gawk -f a.awk STIG.csv

where a.awk is:

BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
}

{
    for (i=1; i<=NF; i++) 
        print "$"i"=|"$i"|"
    print "# Rule: "$4
}

Output:

$ cat STIG.csv
vulnid,ruleid,stigid,"This is a title, hello","A discussion, ,,",check,fix

$ gawk -f a.awk STIG.csv
$1=|vulnid|
$2=|ruleid|
$3=|stigid|
$4=|"This is a title, hello"|
$5=|"A discussion, ,,"|
$6=|check|
$7=|fix|
# Rule: "This is a title, hello"

Question 3

+1 to John Y's comment. Here's a ruby example

ruby -rcsv -e 'CSV.foreach("STIG.csv") do |row|
  (vulnid, ruleid, stigid, title, disc, check, fix) = row
  puts "#" * 40
  puts "# Vulnerability ID: #{vulnid}"
  puts "# Rule ID: #{ruleid}"
  puts "# STID ID: #{stigid}"
  puts "#"
  puts "# Discussion:"
  puts "# #{disc}"
  puts "# Check:"
  puts "# #{check}"
  puts "# Fix:"
  puts "# #{fix}"
  puts "#" * 40
end'

If you want to wrap the long lines, do something like this:

  puts fix.gsub(/(.{1,78})(?:\s+|\Z)/) {|s| "# " + s + "\n"}

Question 4

Your biggest problem is the possibility of a field containing line breaks. In this spirit, the advice to use a language that supports CSV is the best solution.

However, if your only problem are commas (and you know there won't be any line breaks in your fields), you can solve it easily in bash, by temporarily replacing quote-space sequences with an unused character combination of your choice, and replacing it back before outputting:

#!/bin/bash

while IFS=',' read vulnid ruleid stigid title discussion check fix; do
    echo "# Vulnerability ID: $vulnid"
    ...
    echo "# Discussion:"
    echo "# $discussion"
    ...
done <<<"$(sed 's/, /COMMASPACE/g' <STIG.csv)" | sed 's/COMMASPACE/, /g'

Question 5

Below is a somewhat improved version of my answer at Count number of column in a pipe delimited file that is also tailored to this particular problem. A real CSV parser implementation would be best, but the below hack using awk works so long as fields are not split across multiple lines, which is possible when a field begins with a quote and continues until the next quote that is not on the same line. It also assumes the file it receives is well-formed already. The only problem with it is that it will output OFS after the last field. This shouldn't be an issue in your particular case.

Just add the following before your while loop above and change the value of OFS as you see fit, making sure to change the delimiter for cut to match. OFS defaults to |, but you can override it if you wish using the -v option that awk allows as shown:

outfile="$(mktemp 2>/dev/null || printf '%s' "/tmp/STIG.$$")"

outdelim='|'

awk -F',' -vOFS="$outdelim" STIG.csv >"$outfile" <<EOF
#WARNING: outputs OFS after the last field, meaning an empty field is at the end.
BEGIN{ if (OFS=="") OFS='|' }

{
    for (i = 1; i <= NF; i++) {
        if ($i ~ /^".*[^"]$/)
            for (; i <= NF && ($i !~ /.*"$/); i++) {
                printf("%s%s", $i, FS);
            }
        printf("%s%s", $i, OFS);
    }
}
EOF

# loop through the file
while read line; do
    # get all of the content
    vulnid="$(echo $line | cut -d"$outdelim" -f1)"
    .
    .
    .
done < "$outfile"

rm -f "$outfile"