Question

I have coded the following in R: User chooses a file that contains 2 columns (V1 and V2), with numerous rows (number of rows will vary depending on input file) The script calculates the rsq of the relationship between 2 the variables. There can be anything from 10 to 1000 rows of data depending on the input file.

I want to code the following: The code should loop through all rows, removing/omitting/ignoring one row at a time and calculating the new rsq with this row missing. So, for example:

There are 10 rows of data and the total rsq = 0.97 Step1: The first row of data are removed from the equation The rsq is calculated again, but this time for 9 rows, giving rsq = 0.98.
Step 2:The 1st row is re-added and the 2nd row is removed rsq is calculated again Step 3: The second row is re-added and the 3rd row is removed rsq is calculated again

After each loop the "new rsq" will be placed in a new column next to the row that was removed.

Can anyone advise how to do this? I have this coded in excel and it works well but is cumbersome and therefore not ideal.

Was it helpful?

Solution

Do you want to do something like this?

# Make some sample data
set.seed(1095)
data <- data.frame( V1 = 1:10 , V2 = sample.int(5 ,10 ,repl = TRUE ) )

# Use sapply to get r2 removing each row at a time
r2 <- sapply( 1:nrow(data) , function(x) ( cor( data[-x,1] , data[-x,2] ) )^2 )
# Combine into a data frame
newdata <- cbind( data , r2 )
newdata
#      V1 V2        r2
#   1   1  5 0.2526316
#   2   2  3 0.4657601
#   3   3  5 0.3204721
#   4   4  5 0.3691612
#   5   5  1 0.5405405
#   6   6  3 0.3769480
#   7   7  3 0.3840426
#   8   8  2 0.3409425
#   9   9  1 0.2725806
#   10 10  3 0.4986702
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top