I have a character matrix filled with values that follow these general formats: A/-
, A/B
, I/A
, /
, A/
, /A
, -/B
, A/B/C
, A/-/C
.
I need to clean this data set so all that remains are values that follow the format A/B
, in other words, two single characters separated by a forward slash. Anything that contains a -
, I
, multiple forward slashes, a single forward slash with no letters, or a single forward slash with only one letter must be replaced with blanks "".
I have tried numerous iterations of gsub()
to replace any values not fitting the proper format with "".
This is the closest I have found that makes sense to me, but it only gets rid of values containing -
, I
, multiple forward slashes, and a single forward slash (no surrounding letters). Data that remain are in the format A/B
(the one I want to keep), A/
, /B
(the other ones that need to be replaced):
data.matrix = as.matrix(data)
data.matrix.clean = gsub("/./|^/.|./$|^/$|-|I", "", data.matrix)
Perhaps I should write this differently without separating each of my independent criteria with a |
? From what I've read, the ^
is to signify the beginning of a string and the $
is to signify the end of a string. It seems to work in the ^/$
case, but not in the ^/.
or ./$
case and I'm not sure why.
After I try something new, I check to see what format all forward slash containing values are in, using this code which seems to work fine.
slash = grep("/", data.matrix.clean)
slash.t = data.matrix.clean[slash]
table(slash.t)
Any help in better understanding symbols that can be used within gsub()
to make this work properly would be greatly appreciated.
Thank you!