Question

I always use "with" instead of "within" within the context of my research, but I originally thought they were the same. Just now I mistype "with" for "within" and the results returned are quite different. I am wondering why?

I am using the baseball data in the plyr package, so I first load the library by

 require(plyr)

Then, I want to select all rows with an id "ansonca01". At first, as I said, I used "within", and run the function as follows:

within(baseball, baseball[id=="ansonca01", ])

I got very strange results which basically includes everything:

       id year stint team lg   g  ab   r   h X2b X3b hr rbi  sb cs  bb  so ibb hbp sh sf gidp
4     ansonca01 1871     1  RC1     25 120  29  39  11   3  0  16   6  2   2   1  NA  NA NA NA   NA
44    forceda01 1871     1  WS3     32 162  45  45   9   4  0  29   8  0   4   0  NA  NA NA NA   NA
68    mathebo01 1871     1  FW1     19  89  15  24   3   1  0  10   2  1   2   0  NA  NA NA NA   NA
99    startjo01 1871     1  NY2     33 161  35  58   5   1  1  34   4  2   3   0  NA  NA NA NA   NA
102   suttoez01 1871     1  CL1     29 128  35  45   3   7  3  23   3  1   1   0  NA  NA NA NA   NA
106   whitede01 1871     1  CL1     29 146  40  47   6   5  1  21   2  2   4   1  NA  NA NA NA   NA
113    yorkto01 1871     1  TRO     29 145  36  37   5   7  2  23   2  2   9   1  NA  NA NA NA   NA
.........

Then I use "with" instead of "within",

 with(baseball, baseball[id=="ansonca01",])

and got the results that I expected

      id year stint team lg   g  ab   r   h X2b X3b hr rbi sb cs  bb so ibb hbp sh sf gidp
4    ansonca01 1871     1  RC1     25 120  29  39  11   3  0  16  6  2   2  1  NA  NA NA NA   NA
121  ansonca01 1872     1  PH1     46 217  60  90  10   7  0  50  6  6  16  3  NA  NA NA NA   NA
276  ansonca01 1873     1  PH1     52 254  53 101   9   2  0  36  0  2   5  1  NA  NA NA NA   NA
398  ansonca01 1874     1  PH1     55 259  51  87   8   3  0  37  6  0   4  1  NA  NA NA NA   NA
525  ansonca01 1875     1  PH1     69 326  84 106  15   3  0  58 11  6   4  2  NA  NA NA NA   NA

I checked the documentation of with and within by typing help(with) in R environment, and got the following:

with is a generic function that evaluates expr in a local environment constructed from data. The environment has the caller's environment as its parent. This is useful for simplifying calls to modeling functions. (Note: if data is already an environment then this is used with its existing parent.)

Note that assignments within expr take place in the constructed environment and not in the user's workspace.

within is similar, except that it examines the environment after the evaluation of expr and makes the corresponding modifications to data (this may fail in the data frame case if objects are created which cannot be stored in a data frame), and returns it. within can be used as an alternative to transform.

From this explanation of the differences, I don't get why I obtained different results with such a simple operation. Anyone has ideas?

Was it helpful?

Solution 2

The documentation is quite clear about the semantics and return values (and nicely matches the everyday meanings of the words “with” and “within”):

Value:

For ‘with’, the value of the evaluated ‘expr’. For ‘within’, the modified object.

Since your code doesn’t modify anything inside baseball, the unmodified baseball is returned. with on the other hand doesn’t return the object, it returns expr.

Here’s an example where the expression modifies the object:

> head(within(cars, speed[dist < 20] <- 1))
  speed dist
1     1    2
2     1   10
3     1    4
4     7   22
5     1   16
6     1   10

OTHER TIPS

I find simple examples often work to highlight the difference. Something like:

df <- data.frame(a=1:5,b=2:6)
df
  a b
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

with(df, {c <- a + b; df;} )
  a b
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

within(df, {c <- a + b; df;} )
# equivalent to: within(df, c <- a + b)
# i've just made the return of df explicit 
# for comparison's sake
  a b  c
1 1 2  3
2 2 3  5
3 3 4  7
4 4 5  9
5 5 6 11

As above, with returns the value of the last evaluated expression. It is handy for one-liners such as:

with(cars, summary(lm (speed ~ dist)))

but is not suitable for sending multiple expressions.

I often find within useful for manipulating a data.frame or list (or data.table) as I find the syntax easy to read.

I feel that the documentation could be improved by adding examples of use in this regard, e.g.:

df1 <- data.frame(a=1:3,
              b=4:6,
              c=letters[1:3])
## library("data.table")  
## df1 <- as.data.table(df1)
df1 <- within(df1, {
    a <- 10:12
    b[1:2] <- letters[25:26]
    c <- a
})
df1

giving

    a b  c
1: 10 y 10
2: 11 z 11
3: 12 6 12

and

df1 <- as.list(df1)
df1 <- within(df1, {
    a <- 20:23
    b[1:2] <- letters[25:26]
    c <- paste0(a, b)
})
df1

giving

$a
[1] 20 21 22 23

$b
[1] "y" "z" "6"

$c
[1] "20y" "21z" "226" "23y"

Note also that methods("within") gives only these object types, being:

  • within.data.frame
  • within.list
  • (and within.data.table if the package is loaded).

Other packages may define additional methods.

Perhaps unexpectedly for some, with and within are generally not appropriate choices when manipulating variables within defined environments...

To address the comment - there is no within.environment method. Using with requires you to have the function you're calling within the environment, which somewhat defeats the purpose for me e.g.

df1 <- as.environment(df1)
## with(df1, ls()) ## Error
assign("ls", ls, envir=df1)
with(df1, ls())
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top