Question

Maybe it's something trivial and I simply was looking for too long at the same code... When sourcing R module getFLOSSmoleDataXML.R via RStudio, the code correctly detects .Rdata files in cache directory and skips downloading and parsing phases. When, on the other hand, the same module gets processed by R via GNU make (sudo -u ruser make), the result is, well, strange:

Rscript --no-save --no-restore --verbose getFLOSSmoleDataXML.R
running
  '/usr/lib/R/bin/R --slave --no-restore --no-save --no-restore --file=getFLOSSmoleDataXML.R'

Loading required package: RCurl
Loading required package: methods
Loading required package: bitops
Loading required package: XML
Loading required package: digest

Verifying repository: FreeCode

Checking file "http://flossdata.syr.edu/data/fc/2013/2013-Dec/fcProjectAuthors2013-Dec.txt.bz2"...

rdataFile = "./cache/5802dbd08ebefadf70fbb826776f9f0f.Rdata"...

trying URL 'http://flossdata.syr.edu/data/fc/2013/2013-Dec/fcProjectAuthors2013-Dec.txt.bz2'
Content type 'application/x-bzip2' length 514960 bytes (502 Kb)
opened URL
==================================================
downloaded 502 Kb

Error in gzfile(file, "wb") : cannot open the connection
Calls: print ... FUN -> importRepoFiles -> lapply -> FUN -> save -> gzfile
In addition: Warning message:
In gzfile(file, "wb") :
  cannot open compressed file './cache/5802dbd08ebefadf70fbb826776f9f0f.Rdata', probable reason 'No such file or directory'
Timing stopped at: 0.74 0.068 1.134
Execution halted
make[1]: *** [importFLOSSmole] Error 1
make[1]: Leaving directory `/home/ruser/diss-floss/import'
make: *** [collection] Error 2
ubuntu@ip-10-164-108-61:/home/ruser/diss-floss$ ls -l cache/5802*
-rw-r--r-- 1 ruser ruser 1968939 Feb 19 05:47 cache/5802dbd08ebefadf70fbb826776f9f0f.Rdata

As you see from the last two lines, I verified and confirm that the file indeed exists. What is going on here? Any ideas or advice? Thank you!

Was it helpful?

Solution

After brief investigation, I've found the source of this problem myself. As I expected, it's really a simple and small mistake, which I will describe to prevent other people from bumping into similar things.

When I use file.exists() in my code, I pass as parameter the relative path to the file in question. I construct that path by concatenating the hard-coded "cache" directory and the dynamically determined file name itself:

# calculate URL's digest and generate corresponding RData file name
fileDigest <- digest(url, algo="md5", serialize=F)
rdataFile <- paste(RDATA_DIR, "/", fileDigest, RDATA_EXT, sep = "")

However, I forgot that make leaves the top-level project directory and enters the sub-directory to build the code and, thus, the hard-coded value of relative path to "cache" directory (RDATA_DIR="./cache") becomes incorrect. Simple change (RDATA_DIR="../cache") fixed the problem.

That explains the reason behind the "magic" :-), when the same code builds successfully manually (R or RStudio), but fails when building via make. Having said that, I recognize that this might not be the best practice to rely on the predetermined directory structure, but due to time limits I have to decide on compromises (and add items to TODO [potential improvements] list). I will gladly listen to your advice on the best practices in this area.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top