Question

I often categorise times into day/night time using cut(). Because cut() doesn't understand that clock times go around zero, I first divide the hours into three groups (night either side of day), and then merge the two "night" factor levels. This can be done by giving the same "night" value twice to levels(). E.g.

x <- c(4, 10, 23) # i.e. 4 am, 10 am, 11 pm
x <- cut(x
         , breaks = c(0, 6, 22, 23)
         , include.lowest = FALSE
         , labels = c("night2", "day", "night1"))
# [1] night2 day    night1
# Levels: night2 day night1

levels(x) <- c("night", "day", "night")
x
# [1] night day   night
# Levels: night day

Now I'm trying to do the same thing with a huge dataset in an ff object:

require(ff)
require(ffbase)

y <- ff(c(4, 10, 23))
y <- ff(cut(y
            , breaks = c(0, 6, 22, 23)
            , include.lowest = FALSE
            , labels = c("night2", "day", "night1")))
y
# ff (open) integer length=3 (3) levels: night2 day night1
#    [1]    [2]    [3] 
# night2 day    night1 

levels(y) <- c("night", "day", "night")
y
# ff (open) integer length=3 (3) levels: night day night
#  [1]   [2]   [3] 
# night day   night

Note that in this case, levels() has retained three factor levels, two of which have the same label. recodeLevels looked promising but doesn't quite do the same thing:

y <- recodeLevels(y, c("night", "day", "night"))
y
# ff (open) integer length=3 (3) levels: night day night
# [1] [2] [3] 
# NA  day NA  

I've also tried duplicate "night" labels within cut() (actually cut.ff()), but it still returns three levels, plus a warning that duplicate levels in factors are deprecated.

Thanks for your advice.

Was it helpful?

Solution

This might be what you are looking for. Use recodeLevels from package ff

require(ff)
y <- c(4, 10, 23)
y <- ff(cut(y, breaks = c(0, 6, 22, 23), include.lowest = FALSE, 
            labels = c("night2", "day", "night1")))
levels(y) <- c("night", "day", "night")
alllevs <- c("night", "day")
y <- recodeLevels(y, alllevs)
levels(y) <- alllevs
y
ff (open) integer length=3 (3) levels: night day
[1]   [2]   [3] 
night day   night 

OTHER TIPS

This may be too simple, but why not just do:

x <- c(4, 10, 23)
y = c("day", "night")[(x <= 6 | x > 22) + 1]
y
[1] "night" "day"   "night"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top