Replace string with substring in lowercase using sed / awk / tr / perl?

https://stackoverflow.com/questions/13073727

14-07-2021
|

Pregunta

I have a plaintext file containing multiple instances of the pattern $$DATABASE_*$$ and the asterisk could be any string of characters. I'd like to replace the entire instance with whatever is in the asterisk portion, but lowercase.

Here is a test file:

$$DATABASE_GIBSON$$

test me $$DATABASE_GIBSON$$ test me

$$DATABASE_GIBSON$$ test $$DATABASE_GIBSON$$ test

$$DATABASE_GIBSON$$ $$DATABASE_GIBSON$$$$DATABASE_GIBSON$$

Here is the desired output:

gibson

test me gibson test me

gibson test gibson test

gibson gibsongibson

How do I do this with sed/awk/tr/perl?

Solución

Here's the perl version I ended up using.

perl -p -i.bak -e 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' inputFile

Otros consejos

Unfortunately there's no easy, foolproof way with awk, but here's one approach:

$ cat tst.awk
{
   gsub(/[$][$]/,"\n")

   head = ""
   tail = $0

   while ( match(tail, "\nDATABASE_[^\n]+\n") ) {
      head = head substr(tail,1,RSTART-1)
      trgt = substr(tail,RSTART,RLENGTH)
      tail = substr(tail,RSTART+RLENGTH)

      gsub(/\n(DATABASE_)?/,"",trgt)

      head = head tolower(trgt)

   }

   $0 = head tail

   gsub("\n","$$")

   print
}

$ cat file
The quick brown $$DATABASE_FOX$$ jumped over the lazy $$DATABASE_DOG$$s back.
The grey $$DATABASE_SQUIRREL$$ ate $$DATABASE_NUT$$s under a $$DATABASE_TREE$$.
Put a dollar $$DATABASE_DOL$LAR$$ in the $$ string.

$ awk -f tst.awk file
The quick brown fox jumped over the lazy dogs back.
The grey squirrel ate nuts under a tree.
Put a dollar dol$lar in the $$ string.

Note the trick of converting $$ to a newline char so we can negate that char in the match(RE), without that (i.e. if we used ".+" instead of "[^\n]+") then due to greedy RE matching if the same pattern appeared twice on one input line the matching string would extend from the start of the first pattern to the end of the second pattern.

This one works with complicated examples.

perl -ple 's/\$\$DATABASE_(.*?)\$\$/lc($1)/eg' filename.txt

And for simpler examples :

echo '$$DATABASE_GIBSON$$' | sed 's@$$DATABASE_\(.*\)\$\$@\L\1@'

in sed, \L means lower case (\E to stop if needed)

Using awk alone:

> echo '$$DATABASE_AWESOME$$' | awk '{sub(/.*_/,"");sub(/\$\$$/,"");print tolower($0);}'
awesome

Note that I'm in FreeBSD, so this is not GNU awk.

But this can be done using bash alone:

[ghoti@pc ~]$ foo='$$DATABASE_AWESOME$$'
[ghoti@pc ~]$ foo=${foo##*_}
[ghoti@pc ~]$ foo=${foo%\$\$}
[ghoti@pc ~]$ foo=${foo,,}
[ghoti@pc ~]$ echo $foo
awesome

Of the above substitutions, all except the last one (${foo,,}) will work in standard Bourne shell. If you don't have bash, you can instead do use tr for this step:

$ echo $foo
AWESOME
$ foo=$(echo "$foo" | tr '[:upper:]' '[:lower:]')
$ echo $foo
awesome
$

UPDATE:

Per comments, it seems that what the OP really wants is to strip the substring out of any text in which it is included -- that is, our solutions need to account for the possibility of leading or trailing spaces, before or after the string he provided in his question.

> echo 'foo $$DATABASE_KITTENS$$ bar' | sed -nE '/\$\$[^$]+\$\$/{;s/.*\$\$DATABASE_//;s/\$\$.*//;p;}' | tr '[:upper:]' '[:lower:]'
kittens

And if you happen to have pcregrep on your path (from the devel/pcre FreeBSD port), you can use that instead, with lookaheads:

> echo 'foo $$DATABASE_KITTENS$$ bar' | pcregrep -o '(?!\$\$DATABASE_)[A-Z]+(?=\$\$)' | tr '[:upper:]' '[:lower:]'
kittens

(For Linux users reading this: this is equivalent to using grep -P.)

And in pure bash:

$ shopt -s extglob
$ foo='foo $$DATABASE_KITTENS$$ bar'
$ foo=${foo##*(?)\$\$DATABASE_}
$ foo=${foo%%\$\$*(?)}
$ foo=${foo,,}
$ echo $foo
kittens

Note that NONE of these three updated solutions will handle situations where multiple tagged database names exist in the same line of input. That's not stated as a requirement in the question either, but I'm just sayin'....

You can do this in a pretty foolproof way with the supercool command cut :)

echo '$$DATABASE_AWESOME$$' | cut -d'$' -f3 | cut -d_ -f2 | tr 'A-Z' 'a-z'

This might work for you (GNU sed):

sed 's/$\$/\n/g;s/\nDATABASE_\([^\n]*\)\n/\L\1/g;s/\n/$$/g' file

Here is the shortest (GNU) awk solution I could come up with that does everything requested by the OP:

awk -vRS='[$][$]DATABASE_([^$]+[$])+[$]' '{ORS=tolower(substr(RT,12,length(RT)-13))}1'

Even if the string indicated with the asterix (*) contained one or more single Dollar signs ($) and/or linebreaks this soultion should still work.

awk '{gsub(/\$\$DATABASE_GIBSON\$\$/,"gibson")}1' file
gibson

test me gibson test me

gibson test gibson test

gibson gibsongibson

echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}'

awk will take what ever input, in this case the first agurment, and use the tolower function and return the results.

For your bash script you can do something like this and use the variable DBLOWER

DBLOWER=$(echo $$DATABASE_WOOLY$$ | awk '{print tolower($0)}');

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow