Question

I would like to remove everything after the 2nd occurrence of a particular pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?

My input would be

After-u-math-how-however

Output should be

After-u

Everything after the 2nd - should be stripped out. The regex should also match zero occurrences of the pattern, so zero or one occurrence should be ignored and from the 2nd occurrence everything should be removed.

So if the input is as follows

After

Output should be

After

No correct solution

OTHER TIPS

Something like this would do it.

echo "After-u-math-how-however" | cut -f1,2 -d'-'

This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

This might work for you (GNU sed):

sed 's/-[^-]*//2g' file

You could use the following regex to select what you want:

^[^-]*-\?[^-]*

For example:

echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"

Results:

After-u

@EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:

With GNU sed for -r

$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u

With GNU awk for gensub():

$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u

Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
  • Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
  • Always print the 1st field (print $1), followed by:
    • If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
    • Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).

This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:

$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-

Result

After-u
After

This will do it in awk:

echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top