What to do with imperfect-but-useful functions?

https://stackoverflow.com/questions/6828937

26-10-2019
|

Question

I could equally have titled this question, "Is it good enough for CRAN?"

I have a collection of functions that I've built up for specific tasks. Some of these are convenience functions:

# Returns odds/evens from a vector
odds=function(vec) {
    stopifnot(class(vec)=="integer")
    ret = vec[fpart(vec/2)!=0]
    ret
}
evens=function(vec) {
    stopifnot(class(vec)=="integer")
    ret = vec[fpart(vec/2)==0]
    ret
}

Some are minor additions that have proven useful in answering common SO question:

# Shift a vector over by n spots
# wrap adds the entry at the beginning to the end
# pad does nothing unless wrap is false, in which case it specifies whether to pad with NAs
shift <- function(vec,n=1,wrap=TRUE,pad=FALSE) {
    if(length(vec)<abs(n)) { 
        #stop("Length of vector must be greater than the magnitude of n \n") 
    }
    if(n==0) { 
        return(vec) 
    } else if(length(vec)==n) { 
        # return empty
        length(vec) <- 0
        return(vec)
    } else if(n>0) {
        returnvec <- vec[seq(n+1,length(vec) )]
        if(wrap) {
            returnvec <- c(returnvec,vec[seq(n)])
        } else if(pad) {
            returnvec <- c(returnvec,rep(NA,n))
        }
    } else if(n<0) {
        returnvec <- vec[seq(1,length(vec)-abs(n))]
        if(wrap) {
            returnvec <- c( vec[seq(length(vec)-abs(n)+1,length(vec))], returnvec )
        } else if(pad) {
            returnvec <- c( rep(NA,abs(n)), returnvec )
        }

    }
    return(returnvec)
}

The most important are extensions to existing classes that can't be found anywhere else (e.g. a CDF panel function for lattice plots, various xtable and LaTeX output functions, classes for handling and converting between geospatial object types and performing various GIS-like operations such as overlays).

I would like to make these available somewhere on the internet in R-ized form (e.g. posting them on a blog as plain text functions is not what I'm looking for), so that maintenance is easier and so that I and others can access them from any computer that I go to. The logical thing to do is to make a package out of them and post them to CRAN--and indeed I already have them packaged up. But is this collection of functions suitable for a CRAN package?

I have two main concerns:

The functions don't seem to have any coherent overlay. It's just a collection of functions that do lots of different things.
My code isn't always the prettiest. I've tried to clean it up as I learned better coding practices, but producing R Core-worthy beautiful code is not in the cards.

The CRAN webpage is surprisingly bereft of guidelines on posting. Should I post to CRAN, given that some people will find it useful but that it will in some sense forever lock R into having some pretty basic function names taken up? Or is there another place I can use an install.packages-like command to install from? Note I'd rather avoid posting the package to a webpage and having people have to memorize the URL to install the package (not least for version control issues).

Solution

Most packages should be collections of related functions with an obvious purpose, so a useful thing to do would be to try and group what you have together, and see if you can classify them. Several smaller packages are better than one huge incoherent package.

That said, there are some packages that are collections of miscellaneous utility functions, most notably Hmisc and gregmisc, so it is okay to do that sort of thing. If you just have a few functions like that, it might be worth contacting the author of some of the misc packages and seeing if they'll let you include your code in their package.

As for writing pretty code, the most important thing you can do is to use a style guide.

OTHER TIPS

I would use http://r-forge.r-project.org/. From the top of the page:

R-Forge offers a central platform for the development of R packages, R-related software and further projects. It is based on FusionForge offering easy access to the best in SVN, daily built and checked packages, mailing lists, bug tracking, message boards/forums, site hosting, permanent file archival, full backups, and total web-based administration.

In my opinion it is not a good idea to make this type material into packages.
Misc-packages do exist, but mostly for historical reason and/or due to their authoritative contributors, see Frank Harrell Hmisc .

I see three main reason why this choice does non fit for disparate collection of functions.

There are by and large 7000 packages on CRAN only. It is unlikely that your package will be chosen if it does not target a specific field and, even when this happens, it is very possible that other established packages do the same. Therefore your package should also sport an original/better solution to the problem it deals with.
Repositories, and CRAN in particular, are task-oriented, which suggests packages' functions should address a coherent task. And for a good reason: there is no point in downloading a whole package with say, 50 autonomous functions, when I need just a couple of them. Instead, if a package solves a specific data problem of mine, than I will most likely need most (if not all) of them.
R repositories tend to mask the content. Contrary to tech blogs, you do not immediately see the functions' source. You need to download a separate source package and there is a lot of overhead due to the package structure, which buries the actual functions you are willing to show and the others need to read.

In my opinion the best place for general convenience functions, are sites like GitHub. In fact:

One immediately reads them with the comfort of syntax highlight. If they are interesting, they can be pasted in R to give a try and possibly keep them, otherwise one simply steps over to read next function.
There is the possibility of organising code, but without all the constraints of an actual package. Similar functions might go in the same file and coherent files in the same subfolder.
You can show your ideas to the others in a simple way. The readme file can immediately become a sort of mini webpage (via markdown). In comparison CRAN is quite rigid.

There are a lot of other benefits (revision history, accepting contributions, GitHub pages), which may or may not interest you.

Of course, after several functions grow in a stable coherent direction, you will turn them into an actual CRAN package. Also because the copy and paste method to try them becomes then inconvenient.

EDIT: Nowadays there are alternatives to GitHub, which can be taken into consideration too and GitHub has become a common way to distribute packages not yet ready for CRAN or to integrate the official CRAN distribution page.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow