Domanda

Consider this x set of dates:

set.seed(1234)
x <- sample(1980:2010, 100, replace = T)
x <- strptime(x, '%Y')
x <- strftime(x, '%Y')

The following is a distribution of the years of those dates:

> table(x)
x
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1994 
   4    4    3    3    6    4    3    4    5   12    1    1    1    2 
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 
   9    4    2    1    4    4    2    1    4    1    4    3    4    3 
2010 
   1 

Now say I want to group them by decade. For this, I use the cut function:

> table(cut(x, seq(1980, 2010, 10)))
Error in cut.default(x, seq(1980, 2010, 10)) : 'x' must be numeric

Ok, so let's force x to numeric:

> table(cut(as.numeric(x), seq(1980, 2010, 10)))

(1.98e+03,1.99e+03]    (1.99e+03,2e+03]    (2e+03,2.01e+03] 
                 45                  28                  23 

Now, as you can see, the row.names of that table are in scientific format. How do I force them to not be in scientific notation? I've tried wrapping that whole command above inside format, formatC and prettyNum, but all those do is format the frequencies.

È stato utile?

Soluzione

Thanks joran for pointing the path to the answer. I'll elaborate it here for the record:

Changing cut's dig.lab parameter from the default 3 to 4 solved this particular mockup as well as my real problem:

> table(cut(as.numeric(x), seq(1980, 2010, 10), dig.lab = 4))

(1980,1990] (1990,2000] (2000,2010] 
         45          28          23 

By the way, in order for 1980 to be counted one should include the include.lowest argument:

> table(cut(as.numeric(x), seq(1980, 2010, 10), dig.lab = 4, include.lowest = T))

[1980,1990] (1990,2000] (2000,2010] 
         49          28          23 

Now it sums to 100! :)

Altri suggerimenti

This doesn't exactly answer the question you asked, but shows you a possible alternative: use the fact that there is a cut.Date method:

set.seed(1234)
x <- sample(1980:2010, 100, replace = T)
x <- strptime(x, '%Y')
out <- table(cut(x, "10 years"))
out
# 
# 1980-01-01 1990-01-01 2000-01-01 2010-01-01 
#         48         25         26          1 

Here, we also get what I would consider the "correct" values for each bin.


As a crude justification of my statement about "correct" values, consider the values we get when we manually calculate based on table:

y <- strftime(x, '%Y')
Tab <- table(y)
Tab
# y
# 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1994 1995 1996 
#    4    4    3    3    6    4    3    4    5   12    1    1    1    2    9    4 
# 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2010 
#    2    1    4    4    2    1    4    1    4    3    4    3    1 
sum(Tab[grepl("198", names(Tab))])
# [1] 48
sum(Tab[grepl("199", names(Tab))])
# [1] 25
sum(Tab[grepl("200", names(Tab))])
# [1] 26
sum(Tab[grepl("201", names(Tab))])
# [1] 1
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top