Question

Consider this x set of dates:

set.seed(1234)
x <- sample(1980:2010, 100, replace = T)
x <- strptime(x, '%Y')
x <- strftime(x, '%Y')

The following is a distribution of the years of those dates:

> table(x)
x
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1994 
   4    4    3    3    6    4    3    4    5   12    1    1    1    2 
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 
   9    4    2    1    4    4    2    1    4    1    4    3    4    3 
2010 
   1 

Now say I want to group them by decade. For this, I use the cut function:

> table(cut(x, seq(1980, 2010, 10)))
Error in cut.default(x, seq(1980, 2010, 10)) : 'x' must be numeric

Ok, so let's force x to numeric:

> table(cut(as.numeric(x), seq(1980, 2010, 10)))

(1.98e+03,1.99e+03]    (1.99e+03,2e+03]    (2e+03,2.01e+03] 
                 45                  28                  23 

Now, as you can see, the row.names of that table are in scientific format. How do I force them to not be in scientific notation? I've tried wrapping that whole command above inside format, formatC and prettyNum, but all those do is format the frequencies.

Was it helpful?

Solution

Thanks joran for pointing the path to the answer. I'll elaborate it here for the record:

Changing cut's dig.lab parameter from the default 3 to 4 solved this particular mockup as well as my real problem:

> table(cut(as.numeric(x), seq(1980, 2010, 10), dig.lab = 4))

(1980,1990] (1990,2000] (2000,2010] 
         45          28          23 

By the way, in order for 1980 to be counted one should include the include.lowest argument:

> table(cut(as.numeric(x), seq(1980, 2010, 10), dig.lab = 4, include.lowest = T))

[1980,1990] (1990,2000] (2000,2010] 
         49          28          23 

Now it sums to 100! :)

OTHER TIPS

This doesn't exactly answer the question you asked, but shows you a possible alternative: use the fact that there is a cut.Date method:

set.seed(1234)
x <- sample(1980:2010, 100, replace = T)
x <- strptime(x, '%Y')
out <- table(cut(x, "10 years"))
out
# 
# 1980-01-01 1990-01-01 2000-01-01 2010-01-01 
#         48         25         26          1 

Here, we also get what I would consider the "correct" values for each bin.


As a crude justification of my statement about "correct" values, consider the values we get when we manually calculate based on table:

y <- strftime(x, '%Y')
Tab <- table(y)
Tab
# y
# 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1994 1995 1996 
#    4    4    3    3    6    4    3    4    5   12    1    1    1    2    9    4 
# 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2010 
#    2    1    4    4    2    1    4    1    4    3    4    3    1 
sum(Tab[grepl("198", names(Tab))])
# [1] 48
sum(Tab[grepl("199", names(Tab))])
# [1] 25
sum(Tab[grepl("200", names(Tab))])
# [1] 26
sum(Tab[grepl("201", names(Tab))])
# [1] 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top