How to surround all space-padded words with braces in AWK?

Question 1

This awk script works on the sample data:

awk '{ for (i = 1; i <= NF; i++)
         if ($i ~ /^[[:alpha:]]+$/ && (i != 1 || $0 ~ /^ /))
            $i = "{" $i "}"
       print $0
     }' data

For the given input, the output is exactly the desired output. The condition requires the word in each of the fields to be all alphabetic, and either 'not the first word, or if the line as a whole starts with a blank'. If there was an all-alpha word at the end, you could add a condition && (i != NF || $0 ~ / $/) in the if statement.

I used [[:alpha:]] based on the question assuming that in your locale, ü is valid as an alpha character. If you need only plain Latin letters plus ü (U+00FC, LATIN SMALL LETTER U WITH DIAERESIS) and Ü (U+00DC, LATIN CAPITAL LETTER U WITH DIAERESIS), then you can replace that character class with [a-zA-ZüÜ] instead. Only EBCDIC might get screwed up by the use of a-zA-Z, and you'd know if that's a problem for you. You can revise as necessary to get the characters you're interested in.

Question 2

Unless there is another way you can do this, you will need to use lookahead and lookbehind assertions which are not supported in awk or sed. With Perl, you could do the following.

perl -pe 's/(?<= )([a-zA-ZüÜ]+)(?= )/{\1}/g' file

Question 3

With GNU sed you can create a loop and put braces around the words.

$ sed -r ':a;s/ ([[:alpha:]]+) / {\1} /;ta' file
all <div class="first">these</div> <div class="second">words</div> <div class="second">are</div> <div class="second">marked</div> <div class="second">but</div> {these} {words} {are} not.
<div class="first">this</div> {is} <div class="second">another</div> <div class="second">example</div> {with} <div class="second">some</div> {unmarked} words.

The character class can be modified to suit your requirements.

Question 4

With GNU awk for gensub() and \s:

awk '{while((new=gensub(/(\s)([[:alpha:]]+)(\s)/,"\\1{\\2}\\3","g")) != $0) $0=new}1' file