Using the first HTML sample:
grep '<strong>First ascent:</strong>' | sed 's/.*by \([^>]*\)<.*/\1/'
Output:
Sir Edmund Hillary and Tenzing Norgay
Achille Compagnoni and Lino Lacedelli
George Band and Joe Brown
Kurt Diemberger, Peter Diener, Nawang Dorje, Nima Dorje, Ernst Forrer and Albin Schelbert
Hermann Buhl
Maurice Herzog and Louis Lachenal
Andrew Kauffman and Peter Schoening
Hermann Buhl, Kurt Diemberger, Marcus Schmuck and Fritz Wintersteller
It finds all lines with the 'First ascent' label and grabs everything between by
and the <br />
tag.
Edit:
The original answer doesn't filter by the name of the mountain. In addition, the <strong>First ascent:</strong>
is too specific for the page (sometimes there is a space after the :
). The following should work.
grep -i "$1" -A3 | grep 'First ascent:' | sed 's/.*by \([^>]*\)<.*/\1/'
Explanation:
grep -i "$1" -A3
selects the line with the mountain. -i
makes the search case insensitive. The -A3
selects the 3 lines following the matched line, which gets the line with the list of climbers. The quotes around "$1"
are for mountains with names that have spaces.