Replacing just '0' (single zeros) in a column, without replacing the zeros in larger numbers (e.g. 10, 20, 30 etc.)

StackOverflow https://stackoverflow.com/questions/22509672

  •  17-06-2023
  •  | 
  •  

Вопрос

I want to make any 0 values in my data frame have a positive number so that my model will work.

However, when I try to replace all zero values, I also replace the zeros that are in strings belonging to much larger numbers such as 10, 20, 30, 40... 100, 1000 etc..

How do I specify that I only want to replace those values which are actually zero, and not just any string which contains the number zero?

Thanks!

Here's the code:

total<- read.csv("total.csv")    
total.rm <- na.omit(total)

#removing NAs/NAN
total.rm$mediansp[which(is.nan(total.rm$mediansp))] = NA
total.rm$mediansp[which(total.rm$mediansp==Inf)] = NA
total.rm$connections[which(is.nan(total.rm$connections))] = NA
total.rm$connections[which(total.rm$connections==Inf)] = NA

#make all 0 values positive
total.rm$mediansp <- gsub("0", "0.00001", total.rm$mediansp)
total.rm$connections <- gsub("0", "0.00001", total.rm$connections)

#remove zeros varaibles
total.rm$mediansp <- gsub("NA", "0", total.rm$mediansp)
total.rm$connections <- gsub("NA", "0", total.rm$connections)
total.rm$mediansp <- gsub("0", "0.01", total.rm$mediansp)
total.rm$connections <- gsub("0", "0.01", total.rm$connections)

#convert character variables to numeric variables  
total.rm$mediansp <- as.numeric(total.rm$mediansp) 
total.rm$connections <- as.numeric(total.rm$connections)

#plot maps with fitted values and with residuals
sc.lm <- lm (log(mediansp) ~ log(connections), total.rm, na.action="na.exclude")
total.rm$fitted.s <- predict(sc.lm, total.rm) - mean(predict(sc.lm, total.rm))
total.rm$residuals <- residuals(sc.lm)

Here's the structure:

data.frame':    133537 obs. of  19 variables:
$ pcd           : Factor w/ 1736958 levels "AB101AA","AB101AB",..: 
$ pcdstatus     : Factor w/ 5 levels "Insufficient Data",..: 5 5 5 5 5 5 5 5 5 5 ...
$ mbps2         : num  0 0 0 0 1 0 1 1 0 0 ...
$ averagesp     : chr  "16" "19.3" "14.1" "14.9" ...
$ mediansp      : chr  "16.2" "20" "18.7" "16.8" ...
$ maxsp         : chr  "23.8" "24" "20.2" "19.7" ...
$ nga           : num  0 0 0 1 0 1 1 1 1 1 ...
$ connections   : chr  "54" "14" "98" "43" ...
$ oslaua        : Factor w/ 407 levels "","95A","95B",..: 326 326 326 326 326 326 326 
$ x             : int  540194 540194 540300 539958 540311 539894 540311 540379 540310 
$ y             : int  169201 169201 169607 169584 168997 169713 168997 168749 168879 
$ ctry          : Factor w/ 4 levels "E92000001","N92000002",..: 1 1 1 1 1 1 1 1 1 1 
$ hro2          : Factor w/ 13 levels "","E12000001",..: 8 8 8 8 8 8 8 8 8 8 ...
$ soa2          : Factor w/ 7197 levels "","E02000001",..: 145 145 135 135 145 135 145 
$ urindew       : int  5 5 5 5 5 5 5 5 5 5 ...
$ averagesp.lt  : num  2.77 2.96 2.65 2.7 2.05 ...
$ mediansp.lt   : num  2.79 3 2.93 2.82 2.09 ...
$ maxsp.lt      : num  3.17 3.18 3.01 2.98 2.68 ...
$ connections.lt: num  3.99 2.64 4.58 3.76 3.22 ...
Это было полезно?

Решение

gsub is making a regex substitution in your code below. To replace just the character string "0" make the pattern argument in gsub pattern = "^0$". This should solve your problem.

As an added note, it's almost certainly bad form to simply replace 0's with very small numbers to make your models work. Pick a better model.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top