in my large dataset, I have a column named car density (factors), they are like

"001: 0-3.8998943958"
"061:2290.611052-2391.7437"

I want to replace these with the median of each range. there are nearly 10000 observations. I tried the long way: "1.9499"<-sb$CAR_Density[sb$CAR_Density == "001: 0-3.8998943958"], which did not work. I should not put "" on 1.9499, because I want the outcome values to be numeric.

Is there a efficient and understandable way to do this? I am not so good at programming, please help.

有帮助吗?

解决方案

I think you have it the wrong way around, you want

sb$CAR_Density[sb$CAR_Density == "001: 0-3.8998943958"] <- 1.9499

If you tell us a bit more about your data, we can show an automated way of replacing each unique value with a median - but i am lost as to how the densities convert into medians given your example.

其他提示

I'm not sure how you define median but I think you're trying to achieve something like this:

df <- data.frame(
  a = c("001: 0-3.8998943958","061:2290.611052-2391.7437")
  )
df$a <- as.character(df$a)
for(i in 1:nrow(df))
{
df[i,"a1"] <- as.numeric(unlist(strsplit(strsplit(df$a,":")[[i]][2],"-")))[1]
df[i,"a2"] <- as.numeric(unlist(strsplit(strsplit(df$a,":")[[i]][2],"-")))[2]
}

df$amedian <- (df$a1 + df$a2)/2

Output

> df
                          a       a1          a2     amedian
1       001: 0-3.8998943958    0.000    3.899894    1.949947
2 061:2290.611052-2391.7437 2290.611 2391.743700 2341.177376
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top