質問

I'm looking for a more efficient way to create subsets in R. Using a data set where rows = products and columns = time, I would like to find those rows (products) where an item started selling in Week 1, then make that a subset. Then do the same for Week 2, etc.

set.seed(4); d <- data.frame(
 product = seq(1:10),
 week1= sample(0:1,10,replace=TRUE), 
 week2= sample(0:3,10,replace=TRUE),
 week3=sample(0:5,10,replace=TRUE), 
 week4= sample(0:5,10,replace=TRUE),speed=sample(100:200,10),quality=sample(20:50,10)
)

The full data frame is d. So I need to know 2 things to find all the subsets: 1) that all previous weeks sales were ==0 , and then 2) that this week's sales were not zero.

No subsets should overlap since they group products by when they first entered the market.

I found a poor man's way to do this, but I know there has to be a better way!

INEFFICIENT WAY:

subset3<-d[d$week3 >0 & d$week2==0 & d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 ==0 & d$week2==0 & d$week1==0,]

SLIGHTLY MORE EFFICIENT, BUT STILL POOR

subset3<-d[d$week3 >0 & d$week2+d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 + d$week2 + d$week1==0,]

It feels like I should be able to do something like this but it doesn't work:

subset4<-d[d$week4 >0 & sum(d$week1:d$week3) ==0, ]

I don't think ddply or apply will work here, but maybe I'm wrong? The result I need are subsets of d, all the columns, like this:

subset3=

product week1 week2 week3 week4 speed quality
   2     0     0     5     1   124      42
   3     0     0     3     5   155      45
役に立ちましたか?

解決

You could use something like:

d$weekstart <- apply(d[,-1],1,function(x) which(x>0)[1] )

This will identify the first non-zero selling week for each product. You could then use this column to split your dataset up like so:

result <- split(d,d$weekstart)

You can then access each subset like:

result[[1]]

Changing the 1 in the above code to the starting week you want to access will be analogous to having subset1 subset2 etc.

他のヒント

I hope I understood what you try to do. Here an attempt using rle function. I apply it for each row.(each product).

ll <- apply(d,1,function(x){
  y <- rle(x)
  nn <- names(y$lengths[y$values ==0])
  vv <- y$lengths[y$values ==0]
  if(length(nn)==0)
    res <- data.frame(nbr=0,goodweek='week1')
  else
   res <- data.frame(nbr=vv,goodweek=nn)
})


do.call(rbind,ll)
       nbr goodweek
week3    2    week3  ## 2 bad weeks with 0 then week3 is good 0 0 value>0
week31   2    week3
3        0    week1
week4    1    week4
week2    1    week2
6        0    week1 ## all weeks are good
week41   1    week4
8        1          ## the last week is bad! I dont' know what to return here!
9        0    week1
week21   1    week2

Here I use your d :

d
   week1 week2 week3 week4
1      0     0     5     2
2      0     0     1     3
3      1     2     3     2
4      1     1     0     1
5      0     3     1     4
6      1     1     2     4
7      1     2     0     4
8      1     3     2     0
9      1     1     5     4
10     0     3     2     2
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top