Domanda

I'm trying to write some RHEL security hardening automation scripts, and I've got a CSV file that I'm trying to generate the information into readable content. Here's what I have so far...

#!/bin/bash

# loop through the file
while read line; do
        # get all of the content
        vulnid=`echo $line | cut -d',' -f1`
        ruleid=`echo $line | cut -d',' -f2`
        stigid=`echo $line | cut -d',' -f3`
        title=`echo $line | cut -d',' -f4`
        discussion=`echo $line | cut -d',' -f5`
        check=`echo $line | cut -d',' -f6`
        fix=`echo $line | cut -d',' -f7`

        # Format the content

        echo "########################################################"
        echo "# Vulnerability ID: $vulnid"
        echo "# Rule ID: $ruleid"
        echo "# STIG ID: $stigid"
        echo "#"
        echo "# Rule: $title"
        echo "#"
        echo "# Discussion:"
        echo "# $discussion"
        echo "# Check:"
        echo "# $check"
        echo "# Fix:"
        echo "# $fix"
        echo "########################################################"
        echo "# Start Check"
        echo
        echo "# Start Remediation"
        echo
        echo "########################################################"

done < STIG.csv

The problem that I'm having with this is that my text in the CSV contains commas. This is actually perfectly fine and in accordance with the IETF standard (https://www.rfc-editor.org/rfc/rfc4180#page-2 Section 2.4). However, as you can imagine, the cut command doesn't look ahead to see if there's a trailing space after the comma (as you would normally have in English). This is causing all of my fields to get messed up, and I can't figure out how to get this all working right.

Now, I have a feeling that there's some magical regex I can use like ',![:blank:]', but I'll be damned if I know how to utilize it. I'm used to using cut just because it's quick and dirty, but perhaps someone has a better suggestion using maybe awk or sed. This is mainly to generate the bulk structure of my program which repeats itself and is a TON of commentary.

An added note, this has to run on a clean install of RHEL6. I would write this in Ruby, Python, whatever. However, most of those are extra packages which have to be installed. The environment which this script would be deployed is where machines don't have any internet access or extra packages. Python 2.6 is on CentOS6 by default, but RHEL6 (I think). Otherwise, trust me, I'd be writing this whole thing in Ruby.

Here's a sample of the CSV:

V-38447,SV-50247r1_rule,RHEL-06-000519,The system package management tool must verify contents of all files associated with packages.,The hash on important files like system executables should match the information given by the RPM database. Executables with erroneous hashes could be a sign of nefarious activity on the system.,"The following command will list which files on the system have file hashes different from what is expected by the RPM database. # rpm -Va | grep '$1 ~ /..5/ && $2 != 'c''If there is output, this is a finding.","The RPM package management system can check the hashes of installed software packages, including many that are important to system security. Run the following command to list which files on the system have hashes that differ from what is expected by the RPM database: # rpm -Va | grep '^..5'A 'c' in the second column indicates that a file is a configuration file, which may appropriately be expected to change. If the file that has changed was not expected to then refresh from distribution media or online repositories. rpm -Uvh [affected_package]OR yum reinstall [affected_package]"

Also, if anyone is curious, this whole project is out on GitHub.

È stato utile?

Soluzione

All the comments on your question are good ones. There is no support for CSV built in to bash, so if you don't want to use a language like Python, Ruby, Erlang or even Perl, you have to roll your own.

Note that while awk can use commas as a field separator, it also does not properly support CSV with commas embedded in quoted fields. You can hack together a solution with a pattern, as Håkon suggested.

But you don't need to do this in awk; you can do it just as well in bash alone and avoid calling an external tool. How about something like this?

#!/bin/bash

nextfield () {
  case "$line" in
    \"*)
      value="${line%%\",*}\""
      line="${line#*\",}"
      ;;
    *)
      value="${line%%,*}"
      line="${line#*,}"
      ;;
  esac
}

# loop through the file
while read line; do

  # get the content
  nextfield; vulnid="$value"
  nextfield; ruleid="$value"
  nextfield; stigid="$value"
  nextfield; title="$value"
  nextfield; discussion="$value"
  nextfield; check="$value"
  nextfield; fix="$value"

  # format the content
  printf "########################################################\n"
  printf "# Vulnerability ID: %s\n" "$vulnid"
  printf "# Rule ID: %s\n# STIG ID: %s\n#\n" "$ruleid" "$stigid"
  printf "# Rule: %s\n" "$title"
  printf "#\n# Discussion:\n"
  fmt -w68 <<<"$discussion" | sed 's/^/#   /'
  printf "# Check:\n"
  fmt -w68 <<<"$check" | sed 's/^/#   /'
  printf "# Fix:\n"
  fmt -w68 <<<"$fix" | sed 's/^/#   /'
  printf "########################################################\n"
  printf "# Start Check\n\n"
  printf "# Start Remediation\n\n"
  printf "########################################################\n"

done < STIG.csv

The speed advantage will be tremendous, if you're doing a lot of these.

Note the improved formatting, courtesy of fmt. This kind of kills the speed benefit of avoiding the call to an external program, but it sure makes your output easier to read. :)

Altri suggerimenti

In Gnu Awk version 4, you could try:

gawk -f a.awk STIG.csv

where a.awk is:

BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
}

{
    for (i=1; i<=NF; i++) 
        print "$"i"=|"$i"|"
    print "# Rule: "$4
}

Output:

$ cat STIG.csv
vulnid,ruleid,stigid,"This is a title, hello","A discussion, ,,",check,fix

$ gawk -f a.awk STIG.csv
$1=|vulnid|
$2=|ruleid|
$3=|stigid|
$4=|"This is a title, hello"|
$5=|"A discussion, ,,"|
$6=|check|
$7=|fix|
# Rule: "This is a title, hello"

+1 to John Y's comment. Here's a ruby example

ruby -rcsv -e 'CSV.foreach("STIG.csv") do |row|
  (vulnid, ruleid, stigid, title, disc, check, fix) = row
  puts "#" * 40
  puts "# Vulnerability ID: #{vulnid}"
  puts "# Rule ID: #{ruleid}"
  puts "# STID ID: #{stigid}"
  puts "#"
  puts "# Discussion:"
  puts "# #{disc}"
  puts "# Check:"
  puts "# #{check}"
  puts "# Fix:"
  puts "# #{fix}"
  puts "#" * 40
end'

If you want to wrap the long lines, do something like this:

  puts fix.gsub(/(.{1,78})(?:\s+|\Z)/) {|s| "# " + s + "\n"}

Your biggest problem is the possibility of a field containing line breaks. In this spirit, the advice to use a language that supports CSV is the best solution.

However, if your only problem are commas (and you know there won't be any line breaks in your fields), you can solve it easily in bash, by temporarily replacing quote-space sequences with an unused character combination of your choice, and replacing it back before outputting:

#!/bin/bash

while IFS=',' read vulnid ruleid stigid title discussion check fix; do
    echo "# Vulnerability ID: $vulnid"
    ...
    echo "# Discussion:"
    echo "# $discussion"
    ...
done <<<"$(sed 's/, /COMMASPACE/g' <STIG.csv)" | sed 's/COMMASPACE/, /g'

Below is a somewhat improved version of my answer at Count number of column in a pipe delimited file that is also tailored to this particular problem. A real CSV parser implementation would be best, but the below hack using awk works so long as fields are not split across multiple lines, which is possible when a field begins with a quote and continues until the next quote that is not on the same line. It also assumes the file it receives is well-formed already. The only problem with it is that it will output OFS after the last field. This shouldn't be an issue in your particular case.

Just add the following before your while loop above and change the value of OFS as you see fit, making sure to change the delimiter for cut to match. OFS defaults to |, but you can override it if you wish using the -v option that awk allows as shown:

outfile="$(mktemp 2>/dev/null || printf '%s' "/tmp/STIG.$$")"

outdelim='|'

awk -F',' -vOFS="$outdelim" STIG.csv >"$outfile" <<EOF
#WARNING: outputs OFS after the last field, meaning an empty field is at the end.
BEGIN{ if (OFS=="") OFS='|' }

{
    for (i = 1; i <= NF; i++) {
        if ($i ~ /^".*[^"]$/)
            for (; i <= NF && ($i !~ /.*"$/); i++) {
                printf("%s%s", $i, FS);
            }
        printf("%s%s", $i, OFS);
    }
}
EOF

# loop through the file
while read line; do
    # get all of the content
    vulnid="$(echo $line | cut -d"$outdelim" -f1)"
    .
    .
    .
done < "$outfile"

rm -f "$outfile"
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top