Excluding rows in a sorted data.frame in R

https://stackoverflow.com/questions/21652044

08-10-2022
|

Question

I have a two-column dataframe with 99 rows that I have "sorted ascending," but now I want to keep only the first and last ten rows (1:10, 90:99) of the data frame, based on the second column ("Change").

E.g.

ID_NUM      Change
1             -55223
42            -2321
6             -201
20            17
99            93
53            1009

...etc.

How do I go about creating a new data frame that excludes the middle 11:89 rows from the existing data frame?

Solution

Assuming your data frame is already sorted (as appears to be the case here), you can just create your index vector (two methods that produce the same result, pick your favorite):

df[c(1:10, 90:99), ]   # Include only begin and end
df[-c(11:89), ]        # Exclude middle

c just concatenates multiple vectors into one (index) vector. If a vector has negative values (and only negative values), then the rows that correspond to those values are omitted. And versions that work for arbitrary length data frames (with enough rows, of course):

df[-c(11:(nrow(df) - 10)),  ]
df[c(1:10, (nrow(df) - 10):nrow(df)), ]
df[c(head(seq(len=nrow(df)), 10), tail(seq(len=nrow(df))), 10), ]  # borrowing a bit from rawr

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow