Easier would be to match both, and then replace with the white-listed keywords:
gsub('(?:\\b(911\\b|E-COMMERCE\\b|K-12\\b|C\\b[+]{0,2})|[[:punct:]]|[A-Z-]*[0-9][A-Z0-9-]*)', '\\1', blah, perl = TRUE)
Output:
" "
"E-COMMERCE"
"AMAZON E-COMMERCE"
"K-12 911"
" "
""
" OFFICER" # Should this really be "K9 OFFICER"?
"WORK "
"DEVELOPER C++"
" C+ C "
"DEFAULT "
\b
is a word boundary. It matches the empty string at the edges of a sequence of word characters ([A-Za-z0-9_]
). It is an optimized version of(?<!\w)(?=\w)|(?<=\w)(?!\w)
.[A-Z-]*[0-9][A-Z0-9-]*
matches strings of letters, digits and dashes, with at least one digit in them.