You might try the data.table
package for fast operations across relatively large datasets like yours. For example, my machine has no problem working through this:
library(data.table)
# Create a dataset like yours, but bigger
n.rows <- 2e6
n.sic2 <- 1e4
dfSum <- data.frame(gvkey=seq_len(n.rows),
sic2=sample.int(n.sic2, n.rows, replace=TRUE),
cdom="USA",
rscoreSum=rnorm(n.rows))
# Now make your dataset into a data.table
dfSum <- data.table(dfSum)
# Calculate the percentiles
# Note that there is no need to re-assign the result
dfSum[, indPct:=rank(rscoreSum)/length(rscoreSum), by="sic2"]
whereas the plyr
equivalent takes a while.
If you like the plyr
syntax (I do), you may also be interested in the dplyr package, which is billed as "the next generation of plyr", with support for faster data stores in the backend.