質問

i have a data.frame as belows

> a <- c(98:103, 998:1003)
> b <- 1:length(a)
> data <- data.frame(a,b)
> data
      a  b
1    98  1
2    99  2
3   100  3
4   101  4
5   102  5
6   103  6
7   998  7
8   999  8
9  1000  9
10 1001 10
11 1002 11
12 1003 12

I would like to add a column based on column a.

for column a less than 100, i will assign "A" to the new column
for column a in <1000 >=100, i will assign "B" to the new column
and "C" otherwise

My approach is

> data$c <- data$a
> 
> A <- 1:99
> B <- 100:999 
> for (i in 1:length(a)){
+ if (data[i,1] %in% A){
+ data[i,3] <- "A"
+ } else if (data[i,1] %in% B){
+ data[i,3] <- "B"
+ } else {data[i,3] <- "C"}
+ }
> data
      a  b c
1    98  1 A
2    99  2 A
3   100  3 B
4   101  4 B
5   102  5 B
6   103  6 B
7   998  7 B
8   999  8 B
9  1000  9 C
10 1001 10 C
11 1002 11 C
12 1003 12 C
> 

While my real data with over 500,000 rows. May i have better solution?

役に立ちましたか?

解決

Find below a solution using data.table. This version might be especially useful if your key variable (here a) is not numeric.

# Set up data
a <- c(98:103, 998:1003)
b <- 1:length(a)

# Set of values to look for 
A <- 1:99
B <- 100:999

# Create data table and set key
DT <- data.table(a,b)
setkey(DT, a)

# Add new variable
DT[J(A), c:="a"]
DT[J(B), c:="b"]
DT[is.na(DT$c), c:="c"]

If your key variable is not numeric, you can change DT[J(A), c:="a"] to DT[A,c:="a"].

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top