bash script: find content in file between specific lines and run command on content, replace old content with the output of command

https://stackoverflow.com/questions/22182351

05-06-2023
|

Вопрос

I'm a real newb at scripting, only made real simple scripts before with some vars, ifs, simple grep, awk and so on commands.

Q: I have a few thousand files (emails) with cleartext and (sometimes) several independent sections of GPG encrypted text, something like this:

several lines of
cleartext stuff (more specifically: email headers)

-----BEGIN PGP MESSAGE-----
RTDHNRFSGNRTDHNRFSGNRTDHNRFSGN
RTDHNRFSGNRTDHNRFSGNRTDHNRFSGN
-----END PGP MESSAGE-----

some more lines
of cleartext

-----BEGIN PGP MESSAGE-----
WPGLUFPJUWPGLUFPJUWPGLUFPJU
WPGLUFPJUWPGLUFPJUWPGLUFPJU
-----END PGP MESSAGE-----

I'm trying to make a (preferably) bash script that goes through all files in a folder, find each instance of GPG encrypted text, decrypt it, and replace the old encrypted text with the decrypted text, then save the file. So that when the script is done the above hypothetical file looks like this:

several lines of
cleartext stuff (more specifically: email headers)

decrypted message #1

some more lines
of cleartext

decrypted message #2

When trying to just use GPG to decrypt the files GPG will skip all the cleartext stuff and just output the first decrypted message.

So I need something like a while loop I think, to independently find all instances that start with "-----BEGIN PGP MESSAGE-----" and end with "-----END PGP MESSAGE-----" and use the GPG command on that, then replace that instance with the output of the GPG command. And then continue to the next instance of encrypted text.

So far I just have these few lines, but they obviously don't properly do what I want. I don't want to have to use the script on each individual file. And I don't want to use a temp file, I guess there's a much better way to do all of this.

#!/bin/bash

TEMPFILE="${1}.tmp"

## grep only the relevant gpg lines to decrypt.
## this will output ALL encrypted instances to $TEMPFILE
sed -n '/^-----BEGIN PGP MESSAGE/,/^-----END PGP MESSAGE/p' "$1" > "$TEMPFILE"

## decrypt. this will only give me the decrypted output
## of the first encrypted instance in $TEMPFILE.
## and I don't know how to shove this into the proper place in the original file.
gpg --batch -d --no-tty --output "${1}.dc.eml" "$TEMPFILE"

## remove $TEMPFILE
rm "$TEMPFILE"

Here is my made up scripting language hopefully showing a better explanation of what I want to do:

for all files in folder; do
    while i can find an instance of "-----BEGIN PGP" to "-----END PGP"; do
        command: gpg decrypt > $tempvar
        command: replace the instance of "-----BEGIN PGP" to "-----END PGP" with $tempvar
    end while
end for

This is probably pretty simple to achieve (I hope) but I've been at this decryption dilemma for days now and I can't properly figure out how to do it. Any help or hints towards the right direction will be of great help to me.

EDIT: final code, thanks to glenn jackman! :

for file in *; do
    in_pgp_section=false
    pgp_text=""

    while IFS= read -r line; do
        if [[ $line == *BEGIN\ PGP\ MESSAGE* ]]; then
            in_pgp_section=true
        fi

        if ! $in_pgp_section; then
            printf "%s" "$line"
            continue
        fi

        pgp_text+="$line"$'\n'

        if [[ $line == *END\ PGP\ MESSAGE* ]]; then
            printf "%s" "$pgp_text" | gpg --batch -d --no-tty --use-agent
            in_pgp_section=false
            pgp_text=""
        fi
    done < "$file" > "$file.decrypted"
done

Решение

untested

for file in *; do
    in_pgp_section=false
    pgp_text=""

    while read line; do
        if [[ $line == "-----BEGIN PGP MESSAGE-----" ]]; then
            in_pgp_section=true
        fi

        if ! $in_pgp_section; then
            echo "$line"
            continue
        fi

        pgp_text+="$line"$'\n'

        if [[ $line == "-----END PGP MESSAGE-----" ]]; then
            printf "%s" "$pgp_text" | gpg -d
            in_pgp_section=false
            pgp_text=""
        fi
    done < "$file" > "$file.decrypting"

    ln "$file" "$file.encrypted"  &&
    mv "$file.decrypting" "$file"
done

This should decrypt all the PGP section for all the files in the current directory, and leave the original file with a ".encrypted" extension

Другие советы

this is not the answer, but a step in the right direction:

awk '/^-----BEGIN PGP MESSAGE-----$/{store=1;txt="";}
     {if(store==0){print}else{txt=txt"\n"$0}}
     /^-----END PGP MESSAGE-----$/{store=0;print tolower(txt)}' t.txt

/^-----BEGIN PGP MESSAGE-----$/{store=1;txt="";} when the line matches, we initialise the variable txt and set the flag store to 1/^-----END PGP MESSAGE-----$/{store=0;print txt}

{if(store==0){print}else{txt=txt"\n"$0}} for each line, if the flag is 0, we print the line otherwise, we store (append) the line in txt

/^-----END PGP MESSAGE-----$/{store=0;print tolower(txt)} when the line matches, we unset the flag and do the interesting part (I just print in lowercase…). That is your job now. You will probably need to call system("gpg") and use some pipes. Good luck!

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow