Question

Firstly, some details:

  • I am using a combination of C++ (Armadillo library) and R.
  • I am using Ubuntu as my operating system.
  • I am not using Rcpp

Suppose that I have some C++ code called cpp_code which:

  • Reads, as input from R, an integer,
  • Performs some calculations,
  • Saves, as output to R, a spreadsheet "out.csv". (I use .save( name, file_type = csv))

Some simplified R code would be:

for(i in 1:10000)
{
 system(paste0("echo ", toString(i), " | ./cpp_code")) ## produces out.csv
 output[i,,] <- read.csv("out.csv") ## reads out.csv
}

My Problem:

99% of the time, everything works fine. However, every now and then, I keep getting some unusual .tmp files like: "out.csv.tmp_a0ac9806ff7f0000703a". These .tmp files only appear for a second or so, then suddenly disappear.

Questions:

  • What is causing this?
  • Is there a way to stop this from happening?

Please go easy on me since computing is not my main discipline.

Thank you very much for your time.

Was it helpful?

Solution

Many programs write their output to a temporary file, then rename it to the destination file. This is often done to avoid leaving a half-written output file if the process is killed while writing. By using a temporary, the file can be atomically renamed to the output file name ensuring either:

  • the entire output file is properly written or
  • no change is made to the output file

Note there usually are still some race conditions that could result, for example, in the output file being deleted but the temporary file not renamed, but one of the two outcomes above is the general goal.

OTHER TIPS

I believe you're using .save function in armadillo.

http://arma.sourceforge.net/docs.html#save_load_field

There are two functions you should see in include/armadillo_bits/diskio_meat.hpp. In save_raw_ascii, it first stores data to the filename from diskio::gen_tmp_name, and if save_okay, rename by safe_rename. If safe_okay or safe_rename failed, you will see temporary file. The temporary file name is chosen as filename + .tmp_ + some hex value from file name.

//! Save a matrix as raw text (no header, human readable).
//! Matrices can be loaded in Matlab and Octave, as long as they don't have complex elements.
template<typename eT>
inline
bool
diskio::save_raw_ascii(const Mat<eT>& x, const std::string& final_name)
  {
  arma_extra_debug_sigprint();

  const std::string tmp_name = diskio::gen_tmp_name(final_name);

  std::fstream f(tmp_name.c_str(), std::fstream::out);

  bool save_okay = f.is_open();

  if(save_okay == true)
    {
    save_okay = diskio::save_raw_ascii(x, f);

    f.flush();
    f.close();

    if(save_okay == true)
      {
      save_okay = diskio::safe_rename(tmp_name, final_name);
      }
    }

  return save_okay;
  }

//! Append a quasi-random string to the given filename.
//! The rand() function is deliberately not used,
//! as rand() has an internal state that changes
//! from call to call. Such states should not be
//! modified in scientific applications, where the
//! results should be reproducable and not affected 
//! by saving data.
inline
std::string
diskio::gen_tmp_name(const std::string& x)
  {
  const std::string* ptr_x     = &x;
  const u8*          ptr_ptr_x = reinterpret_cast<const u8*>(&ptr_x);

  const char* extra      = ".tmp_";
  const uword extra_size = 5;

  const uword tmp_size   = 2*sizeof(u8*) + 2*2;
        char  tmp[tmp_size];

  uword char_count = 0;

  for(uword i=0; i<sizeof(u8*); ++i)
    {
    conv_to_hex(&tmp[char_count], ptr_ptr_x[i]);
    char_count += 2;
    }

  const uword x_size = static_cast<uword>(x.size());
  u8 sum = 0;

  for(uword i=0; i<x_size; ++i)
    {
    sum += u8(x[i]);
    }

  conv_to_hex(&tmp[char_count], sum);
  char_count += 2;

  conv_to_hex(&tmp[char_count], u8(x_size));


  std::string out;
  out.resize(x_size + extra_size + tmp_size);


  for(uword i=0; i<x_size; ++i)
    {
    out[i] = x[i];
    }

  for(uword i=0; i<extra_size; ++i)
    {
    out[x_size + i] = extra[i];
    }

  for(uword i=0; i<tmp_size; ++i)
    {
    out[x_size + extra_size + i] = tmp[i];
    }

  return out;
  }

What “Dark Falcon” hypothesises is exactly true: when calling save, Armadillo creates a temporary file to which it saves the data, and then renames the file to the final name.

This can be seen in the source code (this is save_raw_ascii but the other save* functions work analogously):

const std::string tmp_name = diskio::gen_tmp_name(final_name);

std::fstream f(tmp_name.c_str(), std::fstream::out);
  
bool save_okay = f.is_open();
  
if(save_okay == true)
  {
  save_okay = diskio::save_raw_ascii(x, f);
   
  f.flush();
  f.close();
    
  if(save_okay == true)
    {
    save_okay = diskio::safe_rename(tmp_name, final_name);
    }
  }

The comment on safe_rename says this:

Safely rename a file. Before renaming, test if we can write to the final file. This should prevent:

  1. overwriting files that are write protected,
  2. overwriting directories.

It’s worth noting that this will however not prevent a race condition.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top