Question

I've been looking through the R Task Views a lot lately and have found that some packages of interest are not included in any task views. Is there an established way to find the complement of the packages listed in the task views?

I realize that via XML and processing of the ctv files (e.g http://cran.r-project.org/web/views/Econometrics.ctv), I can find the union of all of the packages listed in the <packagelist> node, and that available.packages() can list all of the available packages for download. Is that the trick, or am I missing some trick in using a site like CRANberries or CRANtastic?

Update 1 (don't do this - see my answer below): I overlooked mentioning that CRAN lists "In views:" for packages. So, it seems that behind the scenes some information is kept matching packages to the views that they're in. One could easily (and rudely) scrape all of the CRAN package pages and grep for "In views:". That was my initial idea until I came across ctv, which is a bit more elegant.

Update 2: I overlooked linking to ctv. The package documentation is interesting if you're into Task Views.

Was it helpful?

Solution

No hidden tricks, just re-create something like CRANberries (which starts by calling available.packages() and comparing to the state data it stored in a local database).

In your case you may want to compute set differences between what available.packages() gets you and what you get from the ctv package concerning the Task View selections.

Edit 1 Your 'Update 1' idea is crude. Too crude. The meta-information at CRAN comes from, methinks, properly accounting for meta-information: first set is all packages, the add sets for each Task View, possibly split between 'listed' and 'recommended' and aggregate up.

Edit 2 I think you just use code from ctv to parse its files, out come sets. We used that in cran2deb to defines smaller test sets for package creation. Given that set (and the other data), they can generate the web pages. I think you may be over-complicating things. R makes that possible as I also know too darn well ;-)

OTHER TIPS

I should have read the ctv documentation more carefully. The answer was right there: there's a .rds file called Views.rds. Here's a step-by-step method:

myRepos         <- "http://cran.r-project.org"
tmpfile         <- tempfile()
download.file(paste(myRepos, "src/contrib/Views.rds", sep = "/"), destfile = tmpfile)

myViews         <- .readRDS(tmpfile)
func_listPkgs   <- function(x){return(x$packagelist$name)}
aggRaw          <- lapply(myViews, func_listPkgs)
aggInViews      <- unique(unlist(aggRaw))

availRaw        <- available.packages(contriburl = paste(myRepos, "src/contrib", sep = "/"))
availPkgs       <- rownames(availRaw)
notInViews      <- setdiff(availPkgs, aggInViews)

Here's what this is doing:

  1. It gets the Views.rds file from a CRAN mirror.
  2. It loads the Views.rds into a data frame. Note: Prior to 2.13, one needed to use .readRDS, which is now readRDS with 2.13. .readRDS still works, though it's deprecated.
  3. It gets the list of available packages. This could be made more direct: there is a file called Packages.gz that could be downloaded, but then we have to parse it. Let's stick with tools already available. :)
  4. It runs a diff on the two lists. For fun, try the opposite diff: oddPackages <- setdiff(aggInViews, availPkgs). Some of these are packages in base R. Others are ??? who knows.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top