Using factor with levels gives me NA

https://stackoverflow.com/questions/21909137

14-10-2022
|

Question

there's something I cannot figure out
here is my data set

 Proband Lauf Interleukin Ansatz    Zeitpunkt  
    1        3    2        IFNy   stim         ZP21
    2        3    2         iL2   stim    ZP4        
    3        3    2         iL2   stim         ZP14  
    4        5    3         iL2   stim         ZP21  
    5        4    3         iL2   stim   ZP2         
    6        4    3         iL2   stim    ZP4        
    7        4    3         iL2   stim        ZP28   
    8        9    5         iL2   stim ZP0           
    9       13    6        IFNy   stim    ZP4        
    10      13    6         iL2   stim      ZP7      
    11      16    7         iL2   stim         ZP21  
    12      16    7         iL2   stim        ZP28

I want to sort to "Zeitpunkt", so what I did next:

pvalsig1 <- read.csv2(file="pvalsig.csv", fill=NA, na.strings="")
pvalsig1 <- pvalsig[,1:5]
pvalsig1$Zeitpunkt <- as.character(pvalsig1$Zeitpunkt)
pvalsig1$Zeitpunkt <- factor(pvalsig1$Zeitpunkt, levels=c("ZP0", "ZP2", "ZP4", "ZP7", "ZP14", "ZP21", "ZP28", "ZP35", "ZPM9", "ZPM9+1"))

which gives me

Proband Lauf Interleukin Ansatz Zeitpunkt
1        3    2        IFNy   stim      ZP21
2        3    2         iL2   stim      <NA>
3        3    2         iL2   stim      ZP14
4        5    3         iL2   stim      ZP21
5        4    3         iL2   stim      <NA>
6        4    3         iL2   stim      <NA>
7        4    3         iL2   stim      <NA>
8        9    5         iL2   stim      <NA>
9       13    6        IFNy   stim      <NA>
10      13    6         iL2   stim      <NA>
11      16    7         iL2   stim      ZP21
12      16    7         iL2   stim      <NA>

I am sure, its got something to do with the irregular line up in the column "Zeitpunkt" before. But I cannot figure it out what it is and how I get ride of it. Thx

Solution

Try:

pvalsig1$Zeitpunkt <- factor(gsub("\\s*", "", pvalsig1$Zeitpunkt), levels=c("ZP0", "ZP2", "ZP4", "ZP7", "ZP14", "ZP21", "ZP28", "ZP35", "ZPM9", "ZPM9+1"))

This will remove all spaces from your column. The problem you're having is you're trying to create a factor on with values like "ZP0 " with levels ZP0, so the values aren't matching due to the extra spaces.

Note this will break if your factor levels can contain spaces or other blank characters, but if that's the case you can adjust the regular expression pretty easily to something like:

"(^\\s+|\\s*$)"

Also, depending on where you're getting this data from, some of the input functions have facilities to strip extra white space (e.g. read.table has a strip.white argument).

Also, quick search on R trim pulls up this popular SO answer.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow