I have been writing a program that watches a directory and when files are created in it, it changes the name and moves them to a new directory. In my first implementation I used Java's Watch Service API which worked fine when I was testing 1kb files. The problem that came up is that in reality the files getting created are anywhere from 50-300mb. When this happened the watcher API would find the file right away but could not move it because it was still being written. I tried putting the watcher in a loop (which generated exceptions until the file could be moved) but this seemed pretty inefficient.

Since that didn't work, I tried up using a timer that checks the folder every 10s and then moves files when it can. This is the method I ended up going for.

Question: Is there anyway to signal when a file is done being written without doing an exception check or continually comparing the size? I like the idea of using the Watcher API just once for each file instead of continually checking with a timer (and running into exceptions).

I tried putting the watcher in a loop (which generated exceptions until the file could be moved) but this seemed pretty inefficient. Yes, this is an awful solution. Exceptions are not made for managing control flow.
–
Sean Patrick FloydJul 30 '10 at 8:53

1

Sadly @ntmp, from what I've tested so far, looking for exceptions was the best way to tell that the OS was still "writing" or "copying" the file. But I agree with @Sean Patrick Floyd that it is a terrible way to make it work. Personally I wish the check was part of the java.io.File API. Not sure why it wasn't. Would be left up to the JVM guys to implement and make it easier for us developers....
–
Chris AldrichJan 20 '11 at 20:44

2

The "check for exception" approach won't even work on UNIX, since UNIX filesystems do not lock files that are being written. On UNIX, java will happily move the partially written file, resulting in corrupted data.
–
RamanAug 23 '12 at 15:18

Looks like Apache Camel handles the file-not-done-uploading problem by trying to rename the file (java.io.File.renameTo). If the rename fails, no read lock, but keep trying. When the rename succeeds, they rename it back, then proceed with intended processing.

Depending on how urgently you need to move the file once it is done being written, you can also check for a stable last-modified timestamp and only move the file one it is quiesced. The amount of time you need it to be stable can be implementation dependent, but I would presume that something with a last-modified timestamp that hasn't changed for 15 secs should be stable enough to be moved.

While it's not possible to be notificated by the Watcher Service API when the SO finish copying, all options seems to be 'work around' (including this one!).

As commented above,

1) Moving or copying is not an option on UNIX;

2) File.canWrite always returns true if you have permission to write, even if the file is still being copied;

3) Waits until the a timeout or a new event occurs would be an option, but what if the system is overloaded but the copy was not finished? if the timeout is a big value, the program would wait so long.

4) Writing another file to 'flag' that the copy finished is not an option if you are just consuming the file, not creating.

I had to deal with a similar situation when I implemented a file system watcher to transfer uploaded files. The solution I implemented to solve this problem consists of the following:

1- First of all, maintain a Map of unprocessed file (As long as the file is still being copied, the file system generates Modify_Event, so you can ignore them if the flag is false).

2- In your fileProcessor, you pickup a file from the list and check if it's locked by the filesystem, if yes, you will get an exception, just catch this exception and put your thread in wait state (i.e 10 seconds) and then retry again till the lock is released. After processing the file, you can either change the flag to true or remove it from the map.

This solution will be not be efficient if the many versions of the same file are transferred during the wait timeslot.

I ran into the same problem today. I my usecase a small delay before the file is actually imported was not a big problem and I still wanted to use the NIO2 API. The solution I choose was to wait until a file has not been modified for 10 seconds before performing any operations on it.

The important part of the implementation is as follows. The program waits until the wait time expires or a new event occures. The expiration time is reset every time a file is modified. If a file is deleted before the wait time expires it is removed from the list. I use the poll method with a timeout of the expected expirationtime, that is (lastmodified+waitTime)-currentTime

This is a very interesting discussion, as certainly this is a bread and butter use case: wait for a new file to be created and then react to the file in some fashion. The race condition here is interesting, as certainly the high-level requirement here is to get an event and then actually obtain (at least) a read lock on the file. With large files or just simply lots of file creations, this could require a whole pool of worker threads that just periodically try to get locks on newly created files and, when they're successful, actually do the work. But as I am sure NT realizes, one would have to do this carefully to make it scale as it is ultimately a polling approach, and scalability and polling aren't two words that go together well.

I've tried a quick test with one thread writing to the file while other thread checks the canWrite() method but it always returns true.
–
SerhiiOct 28 '10 at 10:29

actually I believe it just checks the OS to see if you have permission to write. You may have permission from a security standpoint, but not from the standpoint of waiting for it to be finished writing to.
–
Chris AldrichJan 20 '11 at 20:40

Use a unique prefix for incomplete files. Something like myhugefile.zip.inc instead of myhugefile.zip. Rename the files when upload / creation is finished. Exclude .inc files from the watch.

The second is to use a different folder on the same drive to create / upload / write the files and move them to the watched folder once they are ready. Moving should be an atomic action if they are on the same drive (file system dependent, I guess).

Either way, the clients that create the files will have to do some extra work.

The problem is I have very little control over the client creating the files. I cannot add a unique prefix. I can specify the folder the files are written too but I can't tell the client to move them to another folder when they are done writing.
–
niteJul 31 '10 at 1:08

1

@ntmp Did you get some solution regarding this problem , please share with me as I am also facing the same kind of issue
–
SAMJan 16 '12 at 7:41