deleting observations in pooled time series [closed]

Question 1

I totally agree with @joran's point there. I'll give a(n) (R) answer here even though this question doesn't show any research effort. For the future, show us the code you've tried as well.

For your problem, the first step to do is to use a base function or a nice package that'll help you split your data.frame to groups, apply whatever function you want to apply to each split group and combine the results (typically called as split-apply-combine strategy). There are couple of nice (external) packages out there, namely, plyr and data.table. Although, I prefer data.table for data.frame-like operations as it's generally lot faster.

So, first we'll convert your data.frame to a data.table. If you don't have this package installed, then you can do it by doing install.packages("data.table").

require(data.table) # load package
dt <- data.table(df) # convert data.frame to data.table

Now, to split a data.table into groups, we can use the argument by within the data.table. And our apply function will be cummax, because this'll give you 0's only for the first consecutive zeros and non-zeros after (if you don't have negative values in your data, which I assume here). Then, the results are automatically combined. So, let's do this:

dt[, .SD[cummax(qty_sold) > 0], by = item]

      item  date qty_sold
 1: orange day_5        5
 2: orange day_6        0
 3: orange day_7        8
 4: orange day_8        0
 5: hammer day_3        3
 6: hammer day_4        0
 7: hammer day_5       70
 8: hammer day_6       70
 9: hammer day_7        0
10: hammer Day_8       80

To sum up:

require(data.table)
dt <- data.table(df)
dt[, .SD[cummax(qty_sold)>0], by = item]

Some more explanation on the syntax. Let's consider first by = item. This the part that internally split's the data for you by item (that is, the whole data.table for item= orange will be considered first, followed by the part for item = hammer etc..).

The middle part .SD[cummax(qty_sold) > 0] is where the magic happens - the apply function equivalent. Here, .SD is just the split-part (corresponding to item taken one at a time. To see more clearly what's in .SD everytime, do: dt[, print(.SD), by = item].

This'll basically remove the rows which have a contiguous 0's just at the start and retaining everything else (the solution is guaranteed as long as there are no negative values).

Question 2

The SAS approach would be something like: keep track in a retained variable whether you already encountered positive values for your item. If not, you do not output. If yes, make note of it in the variable used to keep track of it. After the last line of an item, reset your tracking variable. E.g.: (sort if necessary)

data RESULT (drop=found_first_positive);
    set DATASET;
    by item date;
    retain found_first_positive 0;
    if quantity>0 then found_first_positive=1;
    if found_first_positive;
    if last.item then found_first_positive=0;
run;