Domanda

I need to extract the part of a string in a shell script. The original string is pretty complicated, so I really need a regular expression to select the right part of the original string - justing removing a prefix and suffix won't work. Also, the regular expression needs to check the context of the string I want to extract, so I e.g. need a regular expression a\([^b]*\)b to extract 123 from 12a123b23.

The shell script needs to be portable, so I cannot make use of the Bash constructs [[ and BASH_REMATCH.

I want the script to be robust, so when the regular expression does not match, the script should notice this e.g. through a non-zero exit code of the command to be used.

What is a good way to do this?


I've tried various tools, but none of them fully solved the problem:

  • expr match "$original" ".*$regex.*" works except for the error case. With this command, I don't know how to detect if the regex did not match. Also, expr seems to take the extracted string to determine its exit code - so when I happened to extract 00, expr had an exit code of 1. So I would need to generally ignore the exit code with expr match "$original" ".*$regex.*" || true

  • echo "$original" | sed "s/.*$regex.*/\\1/" also works except for the error case. To handle this case, I'd need to test if I got back the original string, which is also quite unelegant.

So, isn't there a better way to do this?

È stato utile?

Soluzione

You could use the -n option of sed to suppress output of all input lines and add the p option to the substitute command, like this:

echo "$original" | sed -n -e "s/.*$regex.*/\1/p"

If the regular expression matches, the matched group is printed as before. But now if the regular expression does not match, nothing is printed and you will need to test only for the empty string.

Altri suggerimenti

How about grep -o the only possible problem is portability, otherwise it satisfies all requirements:

➜  echo "hello and other things" | grep -o hello
hello
➜  echo $?
0
➜  echo "hello and other things" | grep -o nothello
➜  echo $?
1

One of the best things is that since it's grep you can pick what regex's you want whether BRE, ERE or Perl.

if egrep is available (pretty much all time)

egrep 'YourPattern' YourFile

or

egrep "${YourPattern}" YourFile

if only grep is available

grep -e 'YourPattern' YourFile

you check with a classical [ $? -eq 0 ] for the status of the command (also take into account bad YourFile access)

for the content itself, extract with sed or awk (for portability issue) (after the failure test)

Content="$( sed -n -e "s/.*\(${YourPattern}\).*/\1/p;q" )"
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top