Using no 3rd party packages, is there a way to calculate percentage of row for counts on categorical data?

StackOverflow https://stackoverflow.com/questions/22109028

Question

I have something of an abnormal situation where I can't currently download 3rd party packages to my setup of R. Taking this as a constraint, is there a way to summarize the following data of restaurant locations and closed/open status?

A count(business,vars=c("city","open")) on my data gives me something like this:

"City"       "Open"   "Frequency"
Wickenburg   False    2
Wickenburg   True     26
Wittmann     True     2
Wittmann     False    2
Youngtown    True     7
Yuma         True     1

This is a frequency table of how many restaurants are both open and closed in a given city.

I want to find percentage by group. Example output would look like this

"City"       "Open"   "Frequency"    "Pct of City"
Wickenburg   False    2               7.7
Wickenburg   True     26              92.3
Wittmann     True     2               50.0
Wittmann     False    2               50.0
Youngtown    True     7               100.0
Yuma         True     1               100.0

What's the easiest way to do that in vanilla R?

Was it helpful?

Solution

Try this:

transform(DF, Pct = 100 * ave(Frequency, City, FUN = prop.table))

which gives:

        City  Open Frequency        Pct
1 Wickenburg False         2   7.142857
2 Wickenburg  True        26  92.857143
3   Wittmann  True         2  50.000000
4   Wittmann False         2  50.000000
5  Youngtown  True         7 100.000000
6       Yuma  True         1 100.000000

OTHER TIPS

Here's an entire solution in base R, including calculating the frequency, and including some reproducible sample data.

set.seed(1)
mydf <- data.frame(
  city = sample(LETTERS[1:3], 20, TRUE),
  open = sample(c("True", "False"), 20, TRUE))
head(mydf)
#   city  open
# 1    A False
# 2    B  True
# 3    B False
# 4    C  True
# 5    A  True
# 6    C  True

within(data.frame(table(mydf)), {
  Pct <- ave(Freq, city, FUN = function(x) x/sum(x) * 100)
})
#   city  open Freq      Pct
# 1    A False    2 40.00000
# 2    B False    4 57.14286
# 3    C False    2 25.00000
# 4    A  True    3 60.00000
# 5    B  True    3 42.85714
# 6    C  True    6 75.00000

I think this is a one-liner using tapply:

data = data.frame(City=c("Wickenburg", "Wickenburg", "Wittmann", "Wittmann", "Youngtown", "Yuma"),
                  Open=c(F, T, T, F, T, T), Frequency=c(2, 26, 2, 2, 7, 1))
data$Pct = data$Frequency / tapply(data$Frequency, data$City, sum)[data$City] * 100
data
#         City  Open Frequency        Pct
# 1 Wickenburg FALSE         2   7.142857
# 2 Wickenburg  TRUE        26  92.857143
# 3   Wittmann  TRUE         2  50.000000
# 4   Wittmann FALSE         2  50.000000
# 5  Youngtown  TRUE         7 100.000000
# 6       Yuma  TRUE         1 100.000000

What about using tapply, merging and then dividing. I think this might work:

countDF <- data.frame(count(business,vars=c("city","open")))
colnames(countDF) <- c("City", "Open", "Frequency")

tmp <- data.frame(tapply(countDF$Frequency, countDF$City, sum)

countDF <- merge(countDF, tmp, by=1) 

countDF$PctOfCity <- (countDF$Frequency / countDF$V1) * 100

countDF$V1 <- NULL
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top