Question

I have a repository where old files are stored, like an archive.
Users use a simple web application to fetch those files.
I maintain a simple file system cache on the server where my web app is running.
At least it looked simple while it was just an idea :)

I need to synchronize file creation in that cache in a way that only one thread at a time is allowed to fetch the same file from the archive.

All other threads that happen to need that file must wait until the first thread finishes writing it to cache and then fetch it from there.
At first I used File.exists() method, but that's no good, because it returns true immediately after a thread (lock owner) creates an empty file (so it can start writing to it from a repository stream).

I'm not sure whether that's the right method, but I'm using static Map (which maps file_ID to syncDummyObject) to track which files are currently being fetched.
Then I (tried to) sync file fetching on that syncDummyObject.

Is this the right way to do this? The code is working, but before I put it in production I need to be sure it's good to go.

I considered using staging directory where I create files and transfer them in cache when they are complete, but that would open another set of problems...

I removed logging and non-relevant parts of error handling for better readability.

Thanks!

public class RepoFileFetcher{

    private static volatile ConcurrentHashMap<String, Object> syncStrings = new ConcurrentHashMap<String, Object>();    
    private static final Object mapSync = new Object(); // map access sync
    private Boolean isFileBeingCreated = new Boolean(false);
    private Boolean isFileReadyInCache = new Boolean(false);


    public File getFileById(MikFileIdentifier cxfi){        
        File theFile = null; // file I'm going to return in the end

        try{
            Object syncObject = null;

            // sync map access
            synchronized(mapSync){

                if(syncStrings.containsKey(cxfi.getFilePath())){
                    // if the key exists in the map it means that
                    // it's being created by another thread
                    // fetch the object from the map 
                    // and use it to wait until file is created in cache
                    syncObject = syncStrings.get(cxfi.getFilePath());

                    isFileBeingCreated = true;


                }else if(!(new File(cxfi.getFilePath())).exists()){
                    // if it doesn't exist in map nor in cache it means that
                    // I'm the first one that fetches it from repo
                    // create new dummyLockObject and put it in the map
                    syncObject = new Object();
                    syncStrings.put(cxfi.getFilePath(), syncObject);

                }else{
                    // if it's not being created and exists in cache
                    // set flag so I can fetch if from the cache bellow
                    isFileReadyInCache = true;
                }
            }


            // potential problem that I'm splitting the critical section in half,
            // but I don't know how to avoid locking the whole fetching process
            // I want to lock only on the file that's being fetched, not fetching of all files (which I'd get if the mapSync was still locked)
            // What if, at this very moment, some other thread starts fetching the file and isFileBeingCreated becomes stale? Is it enough to check whether I succeeded renaming it and if not then fetch from cache? 


            if(!isFileBeingCreated && !isFileReadyInCache){

                // skip fetching from repo if another thread is currently fetching it
                // sync only on that file's map object

                synchronized(syncObject){
                    File pFile = new File(cxfi.getFilePath());
                    pFile.createNewFile();

                    // ...
                    // ... the part where I write to pFile from repo stream
                    // ...

                    if(!pFile.renameTo(theFile)){
                        // file is created by someone else 
                        // fetch it from cache
                        theFile = fetchFromCache(cxfi, syncObject);
                    }

                    syncStrings.remove(cxfi.getFilePath());

                    // notify all threads in queue that the file creation is over
                    syncObject.notifyAll();
                }//sync

            }else{

                theFile = fetchFromCache(cxfi, syncObject);
            }

            return theFile;


        }catch(...{
            // removed for better readability
        }finally{
            // remove from the map, otherwise I'll lock that file indefinitely
            syncStrings.remove(cxfi.getFilePath());
        }

        return null;
    }


    /**
     * Fetches the file from cache
     * @param cxfi File identification object
     * @param syncObject Used to obtain lock on file
     * @return File from cache
     * @throws MikFileSynchronizationException
     * @author mbonaci
     */
    private File fetchFromCache(FileIdentifier cxfi, Object syncObject)
            throws MikFileSynchronizationException{

        try{
            // wait till lock owner finishes creating the file
            // then fetch it from the cache

            synchronized(syncObject){   

                // wait until lock owner removes dummyObject from the map
                // while(syncStrings.containsKey(cxfi.getFilePath()))
                // syncObject.wait();                   

                File existingFile = new File(cxfi.getFilePath());
                if(existingFile.exists()){
                    return existingFile;
                }else{
                    // this should never happen
                    throw new MikFileSynchronizationException();
                }
            }

        }catch(InterruptedException ie){
            logger.error("Synchronization error", ie);
        }
        return null;
    }

EDIT I: Thank you all for your help. The suggestion about using putIfAbsent() on CHM was the key. I ended up doing it like this (any additional comments welcome):

EDIT II: Added CHM element removal in other if branches of the class (cuz' now I put the elem in the map even if I don't need to).

EDIT III: Moved checking of file existence above, in variable isFileInCache.

public class RepoFileFetcher{               
    private static volatile ConcurrentHashMap<String, Object> syncStrings = new ConcurrentHashMap<String, Object>();    

    // save some time so I can lock syncObject earlier 
    private boolean isFileInCache = false;

    // remember whether we put the elem in the map or not 
    // so we know whether to remove it later 
    private boolean insertedMapElem = false; // added in EDIT II 

    /**
     * Fetches the file from repository (anc caches it) or directly from cache if available
     * @param cxfi File identification object
     * @return File
     * @author mbonaci
     */
    public File getFileById(FileIdentifier cxfi){
        String fileId = cxfi.getFileId();
        String fileName = cxfi.getOnlyPath() + fileId;
        File theFile = null; // file I'm going to return in the end

        try{
            Object syncObject = null;
            Object dummyObject = new Object();                  
            isFileInCache = (new File(fileName)).exists();
            syncObject = syncStrings.putIfAbsent(fileId, dummyObject);

            if(syncObject == null){ // wasn't in the map                            
                insertedMapElem = true; // we put the new object in

                if(!isFileInCache){ // not in cache

                    // if it doesn't exist in map nor in cache it means that
                    // I'm the first one that fetches it from repo (or cache was deleted)
                    // syncObject = new lock object I placed in the map
                    syncObject = dummyObject;

                    synchronized(syncObject){
                        File pFile = new File(cxfi.getFilePath());
                        pFile.createNewFile();

                        // ...
                        // ... the part where I write to pFile from repo stream
                        // ...

                        pFile.renameTo(theFile)
                        theFile = pFile;

                        syncStrings.remove(cxfi.getFilePath());

                        // notify all threads in queue that the file is now ready to be fetched from cache
                        syncObject.notifyAll();
                    }//sync

                }else{
                    // if it's not being created and exists in cache it means that it's complete
                    // fetch it from cache without blocking (only reading)
                    syncStrings.remove(cxfi.getFilePath()); // added in EDIT II
                    theFile = new File(fileName);
                }

            }else{
                // if the key exists in the map it means that
                // it's being created by another thread
                // fetch the object from the map 
                // and use it to wait until file is created in cache
                // don't touch the map (I haven't added anything)
                // the lock owner will remove the elem
                // syncObject = the object that what was in the map when I called putIfAbsent()
                theFile = fetchFromCache(cxfi, syncObject);
            }

            return theFile;

        }catch(...{
            // removed for better readability
        }finally{ 
            // no good cuz' this way I'd sometimes remove the map elem
            // while lock owner still writes to a file
            // only the one who placed the elem in the map should remove it
            // remove from the map, otherwise I'll lock that file indefinitely
            // syncStrings.remove(fileId); // commented out in EDIT II
        }
        // remove in case of exception (but only if we added it)
        if(insertedMapElem)
            syncStrings.remove(fileId);
        return null;
    }


    /**
     * Fetches the file from cache after it obtains lock on <code>syncObject</code>
     * @param cxfi File identification object
     * @param syncObject Used to obtain lock on file
     * @return File from cache
     * @author mbonaci
     */
    private File fetchFromCache(FileIdentifier cxfi, Object syncObject){
        String fileId = cxfi.getFileId();
        String fileName =  fileId + ".tif";

        synchronized(syncObject){

            File existingFile = new File(cxfi.getAbsPath() + fileName);

            if(existingFile.exists()){
                return existingFile;
            }
        }
    }
Was it helpful?

Solution

May I suggest a couple of tweaks:

  1. You are already using a ConcurrentHashMap, there is no need for an additional lock.
  2. I would wrap the "file" in a smarter object which has it's own synchronization. So you can do something like:

    • Call putIfAbsent() on the map with the path and the "smart" object which wraps the file.
    • The above will return the value (either the new one if the path did not exist, or the existing wrapper)
    • Have state in the wrapper which knows if it has already been cached
    • call cache() which checks if it has been cached, and if so, does nothing, else caches
    • Then return the "file" from the wrapper (say a getFile() method)

Then ensure that you use a lock inside the wrapper for the public functions, which means that one will block whilst cache() is happening concurrently.

Here is a sketch:

class CachedFile
{
  File realFile;
  // Initially not cached
  boolean cached = false;

  // Construct with file

  public synchronized boolean isCached()
  { return cached; }

  public synchronized void cache()
  {
    if (!cached)
    {
      // now load - safe in the knowledge that no one can get the file (or cache())
      ..
      cached = true; // done
    }
  }

  public synchronized <File> getFile()
  {
    // return the "file"
  }
}

Now your code becomes something like:

ConcurrentHashMap<String, CachedFile> myCache = ConcurrentHashMap<>();
CachedFile newFile = new CachedFile(<path>);
CachedFile file = myCache.putIfAbsent(<path>, newFile);
// Use the new file if it did not exist
if (file == null) file = newFile;
// This will be no-op if already cached, or will block is someone is caching this file.
file.cache();
// Now return the cached file.
return file.getFile();

Do my suggestions make sense?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top