You can use gregexpr
and regmatches
:
regmatches(text, gregexpr('[[:punct:]]*/[[:alpha:][:punct:]]*', text))
# [[1]]
# [1] "/ART" "/NN" "/VAFIN" "/ADV" "/ADV" "/ADJD" "/PWS" "/ADV" "/APPR" "/NE" "./$." "/NE"
# [13] "/PTKNEG" "/ADJD" "/VAFIN" "/ADV" "/KOUS" "/PDAT" ",/$," "/APPR" "/ADJA" "/NN" ",/$;" "/APPR"
# [25] "/APPR" "/CARD" "/NN" "/ART" "./$:"
In words the regex says: "find things that start with zero or more punctuation marks followed by a slash followed by one or more letters or punctuation. If you want to include numbers switch to [:alnum:]
.
Per comments, if you want only uppercase letters the regex would become:
regmatches(text, gregexpr('[[:punct:]]*/[[:upper:][:punct:]]*', text))
As @eddi suggests, [A-Z]
and [:upper:]
are roughly equivalent. Again as @eddi suggests, this regex will catch teh /LETTERS case as well as the /$punct case:
/[A-Z]+|[[:punct:]]/\\$[[:punct:]]