Your question is coming at R from a typical object-oriented perspective, where you have a function/method that modifies an object. (It looks like you want MyFunction
to add columns to whatever data.frame you give it.)
R is a functional programming language which means it tends to not do this. There are ways to make it happen, but they're difficult to use well and are generally considered bad practice.
Let's do a quick example in an R-like way:
# sample data
mydata <- data.frame(a = rnorm(10), b = runif(10))
Then let's say there's a function of two columns that you want to do a lot
common_task <- function(x, y) {
((x - 1) * y + (y - 1) * x) / (x + y - 2)
}
The easiest/most common way to add this to your data.frame is
mydata$calc <- common_task(x = mydata$a, y = mydata$b)
If you want to use variable names, then strings work well. If your task will always be performed on a data.frame with columns named a
and b
, then you can right a function assuming the data.frame has those column names:
common_task2 <- function(data) {
((data$a - 1) * data$b + (data$b - 1) * data$a) /
(data$a + data$b - 2)
}
A better way is to let the columns names be input as strings, but for this the $
subset shortcut won't work, we need to use [
.
common_task3 <- function(data, x = "a", y = "b") {
((data[, x] - 1) * data[, y] + (data[, y] - 1) * data[, x]) / (data[, x] + data[, y] - 2)
}
This last function will assume the column names you want to work on are "a" and "b", unless you tell it otherwise.
However, in all three cases, the function just returns a new column. To get it in your data.frame outside of the function, you need to assign it, i.e.,
mydata$new_col3 <- common_task3(data = mydata)
mydata$new_col2 <- common_task2(data = mydata)
You could assign the columns inside the function, but you'll still need to assign the results to a data.frame, it won't just modify the data.frame outside of your function:
common_task4 <- function(data, x = "a", y = "b") {
data$result <-((data[, x] - 1) * data[, y] + (data[, y] - 1) * data[, x]) /
(data[, x] + data[, y] - 2)
return(data)
}
my_modified_data <- common_task4(data = mydata)
In all of these cases, there are nice functions that can do this for you. @Jilber's answer recommends transform
, which is a good one. The dplyr
library is also very nice and easy to use. You can write your own versions, but the existing ones will usually be faster and more robust.
For lots more detail and examples, see Advanced R Programming: Functions.