Question

Someone else's process is creating a CSV file by appending a line at a time to it, as events occur. I have no control over the file format or the other process, but I know it will only append.

In a Java program, I would like to monitor this file, and when a line is appended read the new line and react according to the contents. Ignore the CSV parsing issue for now. What is the best way to monitor the file for changes and read a line at a time?

Ideally this will use the standard library classes. The file may well be on a network drive, so I'd like something robust to failure. I'd rather not use polling if possible - I'd prefer some sort of blocking solution instead.

Edit -- given that a blocking solution is not possible with standard classes (thanks for that answer), what is the most robust polling solution? I'd rather not re-read the whole file each time as it could grow quite large.

Was it helpful?

Solution

Since Java 7 there has been the newWatchService() method on the FileSystem class.

However, there are some caveats:

  • It is only Java 7
  • It is an optional method
  • it only watches directories, so you have to do the file handling yourself, and worry about the file moving etc

Before Java 7 it is not possible with standard APIs.

I tried the following (polling on a 1 sec interval) and it works (just prints in processing):

  private static void monitorFile(File file) throws IOException {
    final int POLL_INTERVAL = 1000;
    FileReader reader = new FileReader(file);
    BufferedReader buffered = new BufferedReader(reader);
    try {
      while(true) {
        String line = buffered.readLine();
        if(line == null) {
          // end of file, start polling
          Thread.sleep(POLL_INTERVAL);
        } else {
          System.out.println(line);
        }
      }
    } catch(InterruptedException ex) {
     ex.printStackTrace();
    }
  }

As no-one else has suggested a solution which uses a current production Java I thought I'd add it. If there are flaws please add in comments.

OTHER TIPS

You can register to get notified by the file system if any change happens to the file using WatchService class. This requires Java7, here the link for the documentation http://docs.oracle.com/javase/tutorial/essential/io/notification.html

here the snippet code to do that:

public FileWatcher(Path dir) {
   this.watcher = FileSystems.getDefault().newWatchService();
   WatchKey key = dir.register(watcher, ENTRY_MODIFY);
}

void processEvents() {
    for (;;) {
        // wait for key to be signalled
        WatchKey key;
        try {
            key = watcher.take();
        } catch (InterruptedException x) {
            return;
        }

        for (WatchEvent<?> event : key.pollEvents()) {
            WatchEvent.Kind<?> kind = event.kind();

            if (kind == OVERFLOW) {
                continue;
            }
            // Context for directory entry event is the file name of entry
            WatchEvent<Path> ev = cast(event);
            Path name = ev.context();
            Path child = dir.resolve(name);
            // print out event
            System.out.format("%s: %s file \n", event.kind().name(), child);
        }
        // reset key and remove from set if directory no longer accessible
        boolean valid = key.reset();
    }
}

Use Java 7's WatchService, part of NIO.2

The WatchService API is designed for applications that need to be notified about file change events.

This is not possible with standard library classes. See this question for details.

For efficient polling it will be better to use Random Access. It will help if you remember the position of the last end of file and start reading from there.

Just to expand on Nick Fortescue's last entry, below are two classes that you can run concurrently (e.g. in two different shell windows) which shows that a given File can simultaneously be written to by one process and read by another.

Here, the two processes will be executing these Java classes, but I presume that the writing process could be from any other application. (Assuming that it does not hold an exclusive lock on the file-are there such file system locks on certain operating systems?)

I have successfully tested these two classes on both Windoze and Linux. I would very much like to know if there is some condition (e.g. operating system) on which they fail.

Class #1:

import java.io.File;
import java.io.FileWriter;
import java.io.PrintWriter;

public class FileAppender {

    public static void main(String[] args) throws Exception {
        if ((args != null) && (args.length != 0)) throw
            new IllegalArgumentException("args is not null and is not empty");

        File file = new File("./file.txt");
        int numLines = 1000;
        writeLines(file, numLines);
    }

    private static void writeLines(File file, int numLines) throws Exception {
        PrintWriter pw = null;
        try {
            pw = new PrintWriter( new FileWriter(file), true );
            for (int i = 0; i < numLines; i++) {
                System.out.println("writing line number " + i);
                pw.println("line number " + i);
                Thread.sleep(100);
            }
        }
        finally {
            if (pw != null) pw.close();
        }
    }

}

Class #2:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;

public class FileMonitor {

    public static void main(String[] args) throws Exception {
        if ((args != null) && (args.length != 0)) throw
            new IllegalArgumentException("args is not null and is not empty");

        File file = new File("./file.txt");
        readLines(file);
    }

    private static void readLines(File file) throws Exception {
        BufferedReader br = null;
        try {
            br = new BufferedReader( new FileReader(file) );
            while (true) {
                String line = br.readLine();
                if (line == null) { // end of file, start polling
                    System.out.println("no file data available; sleeping..");
                    Thread.sleep(2 * 1000);
                }
                else {
                    System.out.println(line);
                }
            }
        }
        finally {
            if (br != null) br.close();
        }
    }

}

Unfortunately, TailInputStream class, which can be used to monitor the end of a file, is not one of standard Java platform classes, but there are few implementations on the web. You can find an implementation of TailInputStream class together with a usage example on http://www.greentelligent.com/java/tailinputstream.

Poll, either on a consistent cycle or on a random cycle; 200-2000ms should be a good random poll interval span.

Check two things...

If you have to watch for file growth, then check the EOF / byte count, and be sure to compare that and the fileAccess or fileWrite times with the lass poll. If ( > ), then the file has been written.

Then, combine that with checking for exclusive lock / read access. If the file can be read-locked and it has grown, then whatever was writing to it has finished.

Checking for either property alone won't necessarily get you a guaranteed state of written++ and actually done and available for use.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top