Sort data by row based on a range of values

https://stackoverflow.com/questions/23474866

15-07-2023
|

Question

My data is:

phone   colour  length  weight  rating
100       5      3        3      0
200                       1      4
303       3     30               9
302       2     43        0      2
106      43         
203      23      3        1      7

I want my data to look like this:

Variable A (sort_by_model_100):

phone   colour  length  weight  rating
  100         5      3        3      0
  106        43

Variable B (sort_by_model_200):

phone   colour  length  weight  rating
200       4      20       1      4
203      23      3        1      7

Variable C (sort_by_model_300):

     phone  colour  length  weight  rating
      303     3       30       0      9
      302     2       43       0      2

My R code:

data <- read.csv(file.choose(),header=TRUE)

sort_by_model_100 <- split (data, data$phone[100:200])
sort_by_model_200 <- split (data, data$phone[200:300])
sort_by_model_300 <- split (data, data$phone[300:400])

I get this error and my code doesn't work :

Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable

Please help.

La solution

You can use subset:

var_a = subset(data, phone >= 100 & phone < 200)
var_b = subset(data, phone >= 200 & phone < 300)

And so on. Maybe you can improve the code to avoid hard-coding the ranges.

Autres conseils

With this data

data<-data.frame(
    phone=c(100,200,303,302,106,203),
    colour=c(5,NA,3,2,43,23),
    length=c(3,NA,30,43,NA,3),
    weight=c(3,1,NA,0,NA,1),
    rating=c(0,4,9,2,NA,7)
)

I'd use cut to create a factor to indicated model class

model<-cut(data$phone, breaks=c(100,200,300,400), include.lowest=T, right=F)

Then you can use split to create a list of sub-data.frames

split(data, model)

This is likely to be easier to work with than a bunch of different data.frame variables.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow