I totally agree with @joran's point there. I'll give a(n) (R) answer here even though this question doesn't show any research effort. For the future, show us the code you've tried as well.
For your problem, the first step to do is to use a base function or a nice package that'll help you split
your data.frame
to groups, apply
whatever function you want to apply to each split group and combine
the results (typically called as split-apply-combine
strategy). There are couple of nice (external) packages out there, namely, plyr
and data.table
. Although, I prefer data.table
for data.frame
-like operations as it's generally lot faster.
So, first we'll convert your data.frame
to a data.table
. If you don't have this package installed, then you can do it by doing install.packages("data.table")
.
require(data.table) # load package
dt <- data.table(df) # convert data.frame to data.table
Now, to split a data.table
into groups, we can use the argument by
within the data.table
. And our apply
function will be cummax
, because this'll give you 0's only for the first consecutive zeros and non-zeros after (if you don't have negative values in your data, which I assume here). Then, the results are automatically combined. So, let's do this:
dt[, .SD[cummax(qty_sold) > 0], by = item]
item date qty_sold
1: orange day_5 5
2: orange day_6 0
3: orange day_7 8
4: orange day_8 0
5: hammer day_3 3
6: hammer day_4 0
7: hammer day_5 70
8: hammer day_6 70
9: hammer day_7 0
10: hammer Day_8 80
To sum up:
require(data.table)
dt <- data.table(df)
dt[, .SD[cummax(qty_sold)>0], by = item]
Some more explanation on the syntax. Let's consider first by = item
. This the part that internally split
's the data for you by item
(that is, the whole data.table
for item= orange
will be considered first, followed by the part for item = hammer
etc..).
The middle part .SD[cummax(qty_sold) > 0]
is where the magic happens - the apply
function equivalent. Here, .SD
is just the split-part (corresponding to item
taken one at a time. To see more clearly what's in .SD
everytime, do: dt[, print(.SD), by = item]
.
This'll basically remove the rows which have a contiguous 0's just at the start and retaining everything else (the solution is guaranteed as long as there are no negative values).