Question

Is there a nice way to extract get the R-help page from an installed package in the form of an R object (e.g a list). I would like to expose help pages in the form of standardized JSON or XML schemas. However getting the R-help info from the DB is harder than I thought.

I hacked together a while ago to get the HTML of an R help manual page. However I would rather have a general R object that contains this information, that I can render to JSON/XML/HTML, etc. I looked into the helpr package from Hadley, but this seems to be a bit of overkill for my purpose.

Was it helpful?

Solution 2

So below what I hacked together. However I yet have to test it on many help files to see if it generally works.

Rd2list <- function(Rd){
    names(Rd) <- substring(sapply(Rd, attr, "Rd_tag"),2);
    temp_args <- Rd$arguments;

    Rd$arguments <- NULL;
    myrd <- lapply(Rd, unlist);
    myrd <- lapply(myrd, paste, collapse="");

    temp_args <- temp_args[sapply(temp_args , attr, "Rd_tag") == "\\item"];
    temp_args <- lapply(temp_args, lapply, paste, collapse="");
    temp_args <- lapply(temp_args, "names<-", c("arg", "description"));
    myrd$arguments <- temp_args;
    return(myrd);
}

getHelpList <- function(...){
    thefile <- help(...)
    myrd <- utils:::.getHelpFile(thefile);
    Rd2list(myrd);
}

And then you would do something like:

myhelp <- getHelpList("qplot", package="ggplot2");
cat(jsonlite::toJSON(myhelp));

OTHER TIPS

Edited with suggestion of Hadley

You can do this a bit easier by:

getHTMLhelp <- function(...){
    thefile <- help(...)
    capture.output(
      tools:::Rd2HTML(utils:::.getHelpFile(thefile))
    )
}

Using tools:::Rd2txt instead of tools:::Rd2HTML will give you plain text. Just getting the file (without any parsing) gives you the original Rd format, so you can write your custom parsing function to parse it into an object (see the solution of @Jeroen, which does a good job in extracting all info into a list).

This function takes exactly the same arguments as help() and returns a vector with every element being a line in the file, eg:

> head(HelpAnova)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"      
[2] "<html><head><title>R: Anova Tables</title>"                             
[3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">"
[4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"             
[5] "</head><body>"                                                          
[6] ""           

Or :

> HelpGam <- getHTMLhelp(gamm,package=mgcv)
> head(HelpGam)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"      
[2] "<html><head><title>R: Generalized Additive Mixed Models</title>"        
[3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">"
[4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"             
[5] "</head><body>"                                                          
[6] ""           
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top