Question

I am trying do something that in SQL might be done with "...having count(ID) > 2 ..."

I want to find those values of a col in a two column subset of a dataframe that repeat more than twice. The function table gives me sparse matrix and I am not sure how to maneuver its results into what I want. For a matrix, rowSums will find me totals, but I want the total associated with the identifier.

The dataset mtcars shows more clearly what I mean.

x <- head(table( mtcars$hp, mtcars$disp), 20)
x[,1] <- as.numeric(rownames(x))
x
      71.1 75.7 78.7 79 95.1 108 120.1 120.3 121 140.8 145 146.7 160 167.6 225 258 275.8 301 304 318 350 351 360 400 440 460 472
  52    52    1    0  0    0   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  62    62    0    0  0    0   0     0     0   0     0   0     1   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  65    65    0    0  0    0   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  66    66    0    1  1    0   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  91    91    0    0  0    0   0     0     1   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  93    93    0    0  0    0   1     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  95    95    0    0  0    0   0     0     0   0     1   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  97    97    0    0  0    0   0     1     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  105  105    0    0  0    0   0     0     0   0     0   0     0   0     0   1   0     0   0   0   0   0   0   0   0   0   0   0
  109  109    0    0  0    0   0     0     0   1     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  110  110    0    0  0    0   0     0     0   0     0   0     0   2     0   0   1     0   0   0   0   0   0   0   0   0   0   0
  113  113    0    0  0    1   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   0
  123  123    0    0  0    0   0     0     0   0     0   0     0   0     2   0   0     0   0   0   0   0   0   0   0   0   0   0
  150  150    0    0  0    0   0     0     0   0     0   0     0   0     0   0   0     0   0   1   1   0   0   0   0   0   0   0
  175  175    0    0  0    0   0     0     0   0     0   1     0   0     0   0   0     0   0   0   0   0   0   1   1   0   0   0
  180  180    0    0  0    0   0     0     0   0     0   0     0   0     0   0   0     3   0   0   0   0   0   0   0   0   0   0
  205  205    0    0  0    0   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   0   1
  215  215    0    0  0    0   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   0   1   0
  230  230    0    0  0    0   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   0   0   0   0   1   0   0
  245  245    0    0  0    0   0     0     0   0     0   0     0   0     0   0   0     0   0   0   0   1   0   1   0   0   0   0

The result I would like for this dataframe of 20 rows would be:

110 3
175 3
180 3
Was it helpful?

Solution

Something like this?

df <- mtcars[ , c("hp", "disp")]
tt <- with(df, table(hp))
data.frame(count = tt[tt > 2])

#     count
# 110     3
# 175     3
# 180     3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top