Reproducible Research: Convert sas7bdat data files to csv files by invoking statTransfer using GNU make

StackOverflow https://stackoverflow.com/questions/20616666

سؤال

QUESTION:

I'm very new to GNU Make. Is there a better way to programmatically convert statistical datasets from sas7bdat to csv files and keep them in sync with each other using GNU Make to promote reproducible research? Would you approach this problem differently from a coding perspective or is there a better way to promote reproducible research? Can I add an additional pre-requisite (i.e. statTransferOptions.txt) while using static pattern rules?

The solution needs to:

  • Find all sas7bdat files in all subdirectories
  • Read statTransfer options
  • Convert the sas7bdat file to csv file using statTransfer command line tool with options
  • Given the current limitations of statTransfer, I think this will require a two step process:
    • Build statTransfer command file (.stcmd) for each SAS data file (.sas7bdat)
    • Build csv file for each stcmd file by executing statTransfer (st) using options in stcmd file
    • target stcmd and csv files should reside in same subdirectory as pre-requisite sas7bdat file
    • Find out-of-date stcmd and csv files and update them if a new sas7bdat file exists or if base option file changes

CONTEXT:

I have inherited a large statistical report which is published annually. In previous years, analysis was done in SAS. We are now using R. Some of the sas7bdat files generated by SAS Enterprise Guide do not import correctly with the sas7bdat package. StatTransfer, a commercial product, has a command-line interface and does convert sas7bdat files to csv files properly; however, there are options that improve conversion (e.g., writing of date formats). The sas7bdat files are in multiple subdirectories corresponding to the type of dataset and the year.

This approach was further prompted by:

Gandrud, Christopher (2013-06-21). Reproducible Research with R and RStudio (Chapman & Hall/CRC The R Series) (pp. 104-105). Chapman and Hall/CRC. Kindle Edition.

TROUBLESHOOTING:

SUGGESTED MAKEFILE?

RDIR := .

######
#PREP#
######
# Use BASH shell to create list of source sas7bdat files
SASDATA = $(shell find $(RDIR) -type f -name '*.sas7bdat')

# Use pattern substring functions to define variable list of filenames
# to be used as targets in recipes
STCMD_OUT = $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.stcmd, $(SASDATA))
CSV_OUT = $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.csv, $(SASDATA))

#########
#TARGETS#
#########

all: $(STCMD_OUT) $(CSV_OUT)

# I think the name "static pattern rules" is misleading
# but I found this to be helpful:
# http://www.gnu.org/software/make/manual/make.html#Static-Pattern

# can I add statTransferOptions.txt as a pre-requisite while using static pattern rules?

$(STCMD_OUT): $(RDIR)/$(@D)/%.stcmd: $(RDIR)/$(@D)/%.sas7bdat
    cp $(RDIR)/statTransferOptions.txt $@
    echo copy $(RDIR)/$< delim $(RDIR)/$(basename $<).csv -v >> $@
    echo quit >> $@

$(CSV_OUT): $(RDIR)/$(@D)/%.csv: $(RDIR)/$(@D)/%.stcmd
    st $(RDIR)/$<

clean:
    rm $(STCMD_OUT)
    rm $(CSV_OUT)

REVISED MAKEFILE AFTER INPUT FROM SO:

RDIR := .

######
#PREP#
######
# Create list of source sas7bdat files
SASDATA := $(shell find $(RDIR) -type f -name '*.sas7bdat')

STCMD_OUT := $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.stcmd, $(SASDATA))
CSV_OUT := $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.csv, $(SASDATA))

#########
#TARGETS#
#########

all: $(STCMD_OUT) $(CSV_OUT)

$(STCMD_OUT): %.stcmd: %.sas7bdat statTransferOptions.txt
    cp $(RDIR)/statTransferOptions.txt $@
    echo copy $(RDIR)/$< delim $(RDIR)/$(basename $<).csv -v -y >> $@
    echo quit >> $@

$(CSV_OUT): %.csv: %.stcmd
    st $(RDIR)/$<

clean:
    rm $(STCMD_OUT)
    rm $(CSV_OUT)

However, correct option might be to debug CRAN sas7bdat package so that the entire toolchain is available rather than invoke proprietary statTransfer.

هل كانت مفيدة؟

المحلول

In SO, we generally don't have the time or energy (or, often, interest) to go read related papers, options, alternatives, etc. It works best if you simply and clearly specify the code you have problems with (in this case, the makefile which is provided so that's great), the exact problem you have including error messages or incorrect outputs (this is not obvious from your question), what you wanted to happen that did not happen, because this is not always clear, and perhaps any additional thoughts or directions you've tried and have not worked.

I'm not sure exactly what the problem you're having is, but I see a number of issues with your makefile. First, this will work but is highly inefficient:

SASDATA = $(shell find $(RDIR) -type f -name '*.sas7bdat')

You should use the := form of assignment here. Probably you should use it when setting STCMD_OUT and CSV_OUT as well, although this is less critical.

Most important, though, these rules are not right:

$(STCMD_OUT): $(RDIR)/$(@D)/%.stcmd: $(RDIR)/$(@D)/%.sas7bdat

You cannot use automatic variables like $@ (or any of their alternative forms) in the target or prerequisite lists. The automatic variables are only defined within the recipe of the rule. You can use secondary expansion for this, but I'm not sure why you're trying to do this. Why not just use:

$(STCMD_OUT): %.stcmd: %.sas7bdat

? Ditto for the other static pattern rule?

As for your question, yes, it's perfectly fine to add extra prerequisites such as statTransferOptions.txt to the static pattern rule.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top