Question

Sorry for the bad post to start...

I'm trying to do this:

function(pollutant)

##some code here

bad <- is.na(dataset$pollutant)
mean(dataset$pollutant[!bad])

Where

dataset <- read.csv(file, header=TRUE)

The file has multiple pollutants as column names. If I explicitly enter the pollutant name instead of the variable "pollutant" the code works.

For example:

function()

##some code here

bad <- is.na(dataset$CO2)
mean(dataset$CO2[!bad])

What is the correct syntax so I can have a variable pollutant name?

Was it helpful?

Solution

You seem to be asking how to use a column name passed as an argument to a function??

myfunction <- function(df, col) mean(df[,col], na.rm=T)

# test
set.seed(1)
df <- data.frame(x=rnorm(10),y=rnorm(10))
myfunction(df,"x")
# [1] 0.1322028

This also works if you pass a column number.

myfunction(df,1)
# [1] 0.1322028

OTHER TIPS

You may wish to consider avoiding writing a function and just use the with function in R

> DF
#   col1 pollutant
# 1    1         4
# 2    2         5
# 3    3        NA
# 4    4         7
# 5    5         8
# 6    6        NA

> with(DF, mean(pollutant, na.rm = TRUE))
# [1] 6

and

> with(DF, mean(col1, na.rm = TRUE))
# [1] 3.5

If you want a function, you can pass the column directly to the function,

f <- function(column){
    mean(column, na.rm = TRUE)
}

> f(DF[, 'pollutant'])
# [1] 6

Or even pass na.rm as an argument in your custom function arguments with .... This makes it easier if you're making more than one calculation in your function that requires use of the same argument.

f2 <-function(column, ...){
    list(mean = mean(column, ...), 
         stDev = sd(column, ...), 
         var = var(column, ...))
}

> f2(DF[, 'pollutant'], na.rm = TRUE)
# $mean
# [1] 6

# $stDev
# [1] 1.825742

# $var
# [1] 3.333333
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top