Question

I'm using make to control the data flow in a statistical analysis. If have my raw data in a directory ./data/raw_data_files, and I've got a data manipulation script that creates cleaned data cache at ./cache/clean_data. The make rule is something like:

cache/clean_data:
  scripts/clean_data 

I do not want to touch the data in ./data/, either with make, or any of my data munging scripts. Is there any way in make to create a dependency for the cache/clean_data that just checks whether specific files in ./data/ are newer than last time make ran?

Was it helpful?

Solution

If clean_data is a single file, just let it depend on all data files:

cache/clean_data: data/*
    scripts/clean_data

If it is a directory containing multiple cleaned files, the easiest way is to write a stamp file and have that depend on your data files:

cache/clean_data-stamp: data/*
    scripts/clean_data
    touch cache/clean_data-stamp

Note that this regenerates all clean_data files if one data file changes. A more elaborate approach is possible if you have a 1-to-1 mapping between data and cleaned files. The GNU Make Manual has a decent example of this. Here is an adaptation:

 DATAFILES:= $(wildcard data/*)
 CACHEFILES:= $(patsubst data/%,cache/clean_data/%,$(DATAFILES))

 cache/clean_data/% : data/%
         scripts/clean_data --input $< --output $@

 all: $(CACHEFILES)

Here, we use wildcard to get a list of all files under data. Then we replace the data path with the cache path using patsubst. We tell make how to generate cache files via a static pattern rule, and finally, we define a target all which generates all the required cache files.

Of course you can also list your CACHEFILES explicitly in the Makefile (CACHEFILES:= cache/clean_data/a cache/clean_data/b), but it is typically more convenient to let make handle that automatically, if possible.

Notice that this complex example probably only works with GNU Make, not in Windows' nmake. For further info, consult the GNU Make Manual, it is a great resource for all your Makefile needs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top