Description

only throw an exception if it can't delete everything (listing everything that it can't delete)

Reasoning...
Unlike unix, the Microsoft Windows OS does not allow a file to be deleted if something has that file open. This causes delete operations to fail.
Furthermore, most installations of Windows have software that monitors the filesystem for activity and then inspects the contents of recently added/removed files (which means that it'll lock them, albeit temporarily), e.g. the Windows Search service & anti-virus software to name but two (but Windows Vista & Windows 7 seem to have additional complications)

This means that builds which rely on cleaning a workspace before they start will sometimes fail (claiming that they couldn't delete everything because a file was locked), resulting in a build failing with the following output:

Started by an SCM change
Building remotely on jenkinsslave27 in workspace C:\hudsonSlave\workspace\MyProject
Purging workspace...
hudson.util.IOException2: remote file operation failed: C:\hudsonSlave\workspace\MyProject at hudson.remoting.Channel@6f0564d7:jenkinsslave27
at hudson.FilePath.act(FilePath.java:835)
at hudson.FilePath.act(FilePath.java:821)
at hudson.plugins.accurev.AccurevSCM.checkout(AccurevSCM.java:331)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1218)
at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:586)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:475)
at hudson.model.Run.run(Run.java:1434)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:239)
Caused by: java.io.IOException: Unable to delete C:\hudsonSlave\workspace\MyProject\...\src\...\foo - files in dir: [C:\hudsonSlave\workspace\MyProject\...\src\...\foo\bar]
at hudson.Util.deleteFile(Util.java:236)
at hudson.Util.deleteRecursive(Util.java:287)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.Util.deleteRecursive(Util.java:278)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.Util.deleteRecursive(Util.java:278)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.Util.deleteRecursive(Util.java:278)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.Util.deleteRecursive(Util.java:278)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.Util.deleteRecursive(Util.java:278)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.Util.deleteRecursive(Util.java:278)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.Util.deleteRecursive(Util.java:278)
at hudson.Util.deleteContentsRecursive(Util.java:198)
at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:28)
at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:11)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2161)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:60)
at java.lang.Thread.run(Unknown Source)

What's needed is a retry mechanism. i.e. the equivalent of using Ant's <retry><delete file="foo"/></retry>, but with a (small) delay between attempts (and maybe a call to the garbage collector, just in case the process holding the file open is the build slave process itself).

Activity

Note: This file locking behavior also causes non-Jenkins issues, e.g. deleting multiple folders using Windows explorer will sometimes leave one (usually empty) folder behind, and even a simple "RD /S /Q MyFolder" will sometimes fail to delete the folder on its first attempt. In these cases, simply retrying the operation will succeed. Personally, I think it's a Windows "feature".

As a workaround, I've wrapped most of my calls to Ant's <delete> task in <retry>, and this has eliminated this problem from any of my builds that manage to start BUT this doesn't help if Jenkins doesn't get as far as running my builds.
e.g. I'm using the accurev plugin for my SCM and it cleans the working directory before it grabs the source - I typically get about a 1% failure rate at this stage. Whilst 1% is not a blocking issue, it's not reliable, which is not what one wants from a build system.

Personally, I've found that excluding the build areas from Search & anti-virus helps reduce the problem, but it is insufficient to stop these failures completely (at least on Windows 7) - something, somewhere, will still lock files, sometimes, but any investigation (after the build has failed failed) shows that no process has the file "open".

pjdarton
added a comment - 2012-09-27 10:20 Note: This file locking behavior also causes non-Jenkins issues, e.g. deleting multiple folders using Windows explorer will sometimes leave one (usually empty) folder behind, and even a simple "RD /S /Q MyFolder" will sometimes fail to delete the folder on its first attempt. In these cases, simply retrying the operation will succeed. Personally, I think it's a Windows "feature".
As a workaround, I've wrapped most of my calls to Ant's <delete> task in <retry>, and this has eliminated this problem from any of my builds that manage to start BUT this doesn't help if Jenkins doesn't get as far as running my builds.
e.g. I'm using the accurev plugin for my SCM and it cleans the working directory before it grabs the source - I typically get about a 1% failure rate at this stage. Whilst 1% is not a blocking issue, it's not reliable, which is not what one wants from a build system.
Personally, I've found that excluding the build areas from Search & anti-virus helps reduce the problem, but it is insufficient to stop these failures completely (at least on Windows 7) - something, somewhere, will still lock files, sometimes, but any investigation (after the build has failed failed) shows that no process has the file "open".

Added two new system properties that control behavior: "Util.deletionRetries" (an integer, defaults to 3) and "Util.deletionRetryWait" (an integer, defaults to 500ms).

Delete operations that affect directories now try to delete the entire contents of the directory, continuing on to subfolders etc even after encountering files that wouldn't die, before eventually throwing an exception about what wouldn't die. i.e. if a folder has a file "a", "b" and "c", and you can't delete "b", then "a" and "c" would get deleted (and you'll still get the exception about "b").

Delete operations now have multiple attempts at deleting things, so if not everything could be deleted first time around, maybe they'll get deleted 2nd/3rd etc time around. An exception is only thrown if all retry attempts are exhausted and there are still files/directories that won't delete.

Added some unit tests for these methods.

After posting this back in October 2012, I built a version of Jenkins LTS with this patch applied. I've been using it at work for all our development stuff and I've not had file locking problems since. I'm pretty confident that it fixes the problem.

Disclaimers:

I've not tested this on Linux (or the unit-tests). It should be harmless (behaviorial changes are conditional on being on Windows), but it'd be worth running the unit-tests on Linux just to verify that.

pjdarton
added a comment - 2012-10-01 23:11 - edited Features:
Added two new system properties that control behavior: "Util.deletionRetries" (an integer, defaults to 3) and "Util.deletionRetryWait" (an integer, defaults to 500ms).
Delete operations that affect directories now try to delete the entire contents of the directory, continuing on to subfolders etc even after encountering files that wouldn't die, before eventually throwing an exception about what wouldn't die. i.e. if a folder has a file "a", "b" and "c", and you can't delete "b", then "a" and "c" would get deleted (and you'll still get the exception about "b").
Delete operations now have multiple attempts at deleting things, so if not everything could be deleted first time around, maybe they'll get deleted 2nd/3rd etc time around. An exception is only thrown if all retry attempts are exhausted and there are still files/directories that won't delete.
Added some unit tests for these methods.
After posting this back in October 2012, I built a version of Jenkins LTS with this patch applied. I've been using it at work for all our development stuff and I've not had file locking problems since. I'm pretty confident that it fixes the problem.
Disclaimers:
I've not tested this on Linux (or the unit-tests). It should be harmless (behaviorial changes are conditional on being on Windows), but it'd be worth running the unit-tests on Linux just to verify that.

Uploaded git patch file; this was produced using the git command-line and isn't claiming to change the entire file. This will probably be a lot easier to merge.

This is my "New-and-improved" solution.
In addition to retrying the deletes, this also calls System.gc() if it's on Windows (a tactic that's also used in Apache Ant's Delete task to workaround the same problem).

pjdarton
added a comment - 2012-11-09 16:56 Uploaded git patch file; this was produced using the git command-line and isn't claiming to change the entire file. This will probably be a lot easier to merge.
This is my "New-and-improved" solution.
In addition to retrying the deletes, this also calls System.gc() if it's on Windows (a tactic that's also used in Apache Ant's Delete task to workaround the same problem).

pjdarton
added a comment - 2012-11-12 16:59 Have re-done my GitHub pull request to reflect the new changes (and to fix the CRLF issue with the previous pull request).
New pull request is https://github.com/jenkinsci/jenkins/pull/615

I've now been running the LTS Jenkins build (1.480.1) with this patch applied at work for a while.
I've not seen any builds failing due to "file in use" since.
I would therefore recommend that this patch / pull-request be incorporated into the main branch ASAP, and to the next LTS release.

pjdarton
added a comment - 2012-12-07 17:42 I've now been running the LTS Jenkins build (1.480.1) with this patch applied at work for a while.
I've not seen any builds failing due to "file in use" since.
I would therefore recommend that this patch / pull-request be incorporated into the main branch ASAP, and to the next LTS release.

I believe this is also the root cause of JENKINS-15852. The Git Plugin has a call in GitAPI to FilePath.deleteRecursive(), which in turn calls Util.deleteRecursive(). It is almost immediately trying to delete a workspace that has just been created. Additionally, we have encryption and McAfee software monitoring files that could be locking them.

Daniel Kirkdorffer
added a comment - 2013-01-29 21:45 I believe this is also the root cause of JENKINS-15852 . The Git Plugin has a call in GitAPI to FilePath.deleteRecursive(), which in turn calls Util.deleteRecursive(). It is almost immediately trying to delete a workspace that has just been created. Additionally, we have encryption and McAfee software monitoring files that could be locking them.

File-locking is the bane of anyone running any kind of automated system on Windows, so I'd agree that this might well solve the problem (as long as you're sure that the Git code doesn't use the workspace as its current directory, as no amount of retrying will change that).

I also have anti-virus stuff running on my build slaves, and despite that I've not noticed any builds fail due to file-locking issues since I started running a custom build of Jenkins LTS that has this fix in it.
I think that this amounts to a fair amount of circumstantial evidence that this fix works.

pjdarton
added a comment - 2013-01-29 22:09 - edited File-locking is the bane of anyone running any kind of automated system on Windows, so I'd agree that this might well solve the problem (as long as you're sure that the Git code doesn't use the workspace as its current directory, as no amount of retrying will change that).
I also have anti-virus stuff running on my build slaves, and despite that I've not noticed any builds fail due to file-locking issues since I started running a custom build of Jenkins LTS that has this fix in it.
I think that this amounts to a fair amount of circumstantial evidence that this fix works.

We are encountering a similar problem that I originally attributed to some kind of weird conflict between

Use private Maven repository

and

SCM / Subversion / Check-out Strategy / Always checkout a fresh copy

Not sure why a Maven repo entry local to the workspace would be locked before the code is even checked out. Maven shouldn't even be running yet and no process other than the Jenkins job which uses this workspace should be referencing a workspace private maven repo entry.

Environment:

Jenkins 1.517

Maven 3.4 (-Xmx1536m -XX:MaxPermSize=256m)

Java 1.7.0_15-b03 Oracle JVM 64-bit

Windows 2008 Server 64-bit

Clean server with no virus scanner, indexing, etc.

Dell PowerEdge 2950

PERC 5i Serial Attached SCSI controller
This machine has 2 CPUs with 4 cores each (a total of 8 cores).
This server is configured with a single C: partition formed from two physical drives in RAID 1.

Build Console Output

Started by timer
Building in workspace C:\Jenkins\jobs\Maxview-Daily-Build-6.2-WINDOWS-Trunk\workspace
Cleaning local Directory .
java.nio.file.FileSystemException: C:\Jenkins\jobs\Maxview-Daily-Build-6.2-WINDOWS-Trunk\workspace\.\.repository\ant\ant-antlr\1.6.5\ant-antlr-1.6.5.jar: The process cannot access the file because it is being used by another process.
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
at java.nio.file.Files.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at hudson.Util.deleteFile(Util.java:237)
at hudson.Util.deleteRecursive(Util.java:305)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:75)
at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:153)
at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:903)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:884)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:867)
at hudson.FilePath.act(FilePath.java:905)
at hudson.FilePath.act(FilePath.java:878)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:843)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:781)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1369)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:676)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:581)
at hudson.model.Run.execute(Run.java:1576)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:486)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:241)

Brian Brooks
added a comment - 2013-07-10 14:32 - edited We are encountering a similar problem that I originally attributed to some kind of weird conflict between
Use private Maven repository
and
SCM / Subversion / Check-out Strategy / Always checkout a fresh copy
Not sure why a Maven repo entry local to the workspace would be locked before the code is even checked out. Maven shouldn't even be running yet and no process other than the Jenkins job which uses this workspace should be referencing a workspace private maven repo entry.
Environment:
Jenkins 1.517
Maven 3.4 (-Xmx1536m -XX:MaxPermSize=256m)
Java 1.7.0_15-b03 Oracle JVM 64-bit
Windows 2008 Server 64-bit
Clean server with no virus scanner, indexing, etc.
Dell PowerEdge 2950
PERC 5i Serial Attached SCSI controller
This machine has 2 CPUs with 4 cores each (a total of 8 cores).
This server is configured with a single C: partition formed from two physical drives in RAID 1.
Build Console Output
Started by timer
Building in workspace C:\Jenkins\jobs\Maxview-Daily-Build-6.2-WINDOWS-Trunk\workspace
Cleaning local Directory .
java.nio.file.FileSystemException: C:\Jenkins\jobs\Maxview-Daily-Build-6.2-WINDOWS-Trunk\workspace\.\.repository\ant\ant-antlr\1.6.5\ant-antlr-1.6.5.jar: The process cannot access the file because it is being used by another process.
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
at java.nio.file.Files.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at hudson.Util.deleteFile(Util.java:237)
at hudson.Util.deleteRecursive(Util.java:305)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.Util.deleteRecursive(Util.java:296)
at hudson.Util.deleteContentsRecursive(Util.java:202)
at hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:75)
at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:153)
at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:903)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:884)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:867)
at hudson.FilePath.act(FilePath.java:905)
at hudson.FilePath.act(FilePath.java:878)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:843)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:781)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1369)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:676)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:581)
at hudson.model.Run.execute(Run.java:1576)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:486)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:241)

Yes, that's the kind of error you can get when doing any filesystem access on Windows (whether from Java or anything else) - basically, if you're on Windows, ANY file operation can fail (at any point) with a "file locked by another process" error and you need to catch these and retry (as, if you retry after a small delay, whatever process was sabotaging your operation will have moved on).
It's also the kind of error that I kept getting that prompted me to create this patch, and I can state (with some confidence now) that this fixed it for me.

Note: under Java, the process sabotaging your file operation might well be your own - if you don't manually close file handles but just rely on the garbage collector to do so, attempts to delete those files will fail until the GC has run. This is why I run the GC as well, just in case (not sure if that was a deciding factor, but it's what Ant does and it worked for me).

pjdarton
added a comment - 2013-07-10 14:45 Yes, that's the kind of error you can get when doing any filesystem access on Windows (whether from Java or anything else) - basically, if you're on Windows, ANY file operation can fail (at any point) with a "file locked by another process" error and you need to catch these and retry (as, if you retry after a small delay, whatever process was sabotaging your operation will have moved on).
It's also the kind of error that I kept getting that prompted me to create this patch, and I can state (with some confidence now) that this fixed it for me.
Note: under Java, the process sabotaging your file operation might well be your own - if you don't manually close file handles but just rely on the garbage collector to do so, attempts to delete those files will fail until the GC has run. This is why I run the GC as well, just in case (not sure if that was a deciding factor, but it's what Ant does and it worked for me).

To make things even worse, the message "file locked by another process" doesn't necessarily mean that the file IS locked by another process. Notepad++, for example, prints this message even if the real error is "permission denied". Took me quite some time to find out...

Dirk Heinrichs
added a comment - 2013-07-12 06:19 To make things even worse, the message "file locked by another process" doesn't necessarily mean that the file IS locked by another process. Notepad++, for example, prints this message even if the real error is "permission denied". Took me quite some time to find out...

I've just attached a new patch file "0001-Proposed-solution-to-JENKINS-15331.patch"
This one is based on the current Jenkins master trunk (at the time of writing, that's aimed at 1.560-SNAPSHOT).

This is slightly different than the earlier patch:
1) The configuration for garbage-collection when deletes fail now defaults to "false" on all platforms.
2) The garbage-collection should now get called if it's enabled (the previous version had a bug).

pjdarton
added a comment - 2014-04-11 16:03 I've just attached a new patch file "0001-Proposed-solution-to- JENKINS-15331 .patch"
This one is based on the current Jenkins master trunk (at the time of writing, that's aimed at 1.560-SNAPSHOT).
This is slightly different than the earlier patch:
1) The configuration for garbage-collection when deletes fail now defaults to "false" on all platforms.
2) The garbage-collection should now get called if it's enabled (the previous version had a bug).

Also see this issue (and have for a while) up until at least the current 1.5776 release.

As mentioned by others above, it's a common gripe with NTFS, and sadly your chances of hitting that issue increase considerably with large checkouts/workspaces (it struggles to delete efficiently a large number of files).

It would be great if this patch could finally be merged into the head.

Laurent Malvert
added a comment - 2014-08-27 19:43 Also see this issue (and have for a while) up until at least the current 1.5776 release.
As mentioned by others above, it's a common gripe with NTFS, and sadly your chances of hitting that issue increase considerably with large checkouts/workspaces (it struggles to delete efficiently a large number of files).
It would be great if this patch could finally be merged into the head.

Also, I'd like to recommend an alternative to the "delay and wait before retrying" strategy... While this one works most of the time, it's not entirely fool-proof as you can only hope that NTFS will release that lock within the timeframe of your delays/retries.

Generally what serves me best on NTFS systems is to NOT delete large folders (at first), but instead to rename/move them to a different location (where they can be deleted by a batch job). And possibly to recreate the desired folder.

I actually do this for my maven local repository and most of my development checkouts on my development machine. I have a custom alias that moves things to a temp folder instead of deleting them, and a cron job that regularly deletes that folder. This way you have no lock on the folder you're currently working on.

Jenkins could very well use a similar approach by moving the data to be disposed of to the Windows temp folder, or to a trash folder of its own choosing to be regularly emptied by an internal task.

This approach has multiple advantages:

solves the locking for sure,

no garbage collection required,

no artificial delay required,

and actually the "delete" operation is now perceived to be considerably faster (as it doesn't really happen, and move operations are close to instantaneous on most file systems).

Of course it means that at a given time, a lengthy and possibly intensive deletion process will occur in the background, but depending on how you implement it this could be scheduled to be done during periods of inactivity, or according to a planned schedule, or only when running out of disk space, etc...

Just my 2 cents, but considering that it's not atypical for Jenkins to deal with large folders, it would seem like an good approach for a number of scenarios (new/clean workspaces, deleting build records, deleting jobs, etc...).

Laurent Malvert
added a comment - 2014-08-27 19:55 - edited Also, I'd like to recommend an alternative to the "delay and wait before retrying" strategy... While this one works most of the time, it's not entirely fool-proof as you can only hope that NTFS will release that lock within the timeframe of your delays/retries.
Generally what serves me best on NTFS systems is to NOT delete large folders (at first), but instead to rename/move them to a different location (where they can be deleted by a batch job). And possibly to recreate the desired folder.
I actually do this for my maven local repository and most of my development checkouts on my development machine. I have a custom alias that moves things to a temp folder instead of deleting them, and a cron job that regularly deletes that folder. This way you have no lock on the folder you're currently working on.
Jenkins could very well use a similar approach by moving the data to be disposed of to the Windows temp folder, or to a trash folder of its own choosing to be regularly emptied by an internal task.
This approach has multiple advantages:
solves the locking for sure,
no garbage collection required,
no artificial delay required,
and actually the "delete" operation is now perceived to be considerably faster (as it doesn't really happen, and move operations are close to instantaneous on most file systems).
Of course it means that at a given time, a lengthy and possibly intensive deletion process will occur in the background, but depending on how you implement it this could be scheduled to be done during periods of inactivity, or according to a planned schedule, or only when running out of disk space, etc...
Just my 2 cents, but considering that it's not atypical for Jenkins to deal with large folders, it would seem like an good approach for a number of scenarios (new/clean workspaces, deleting build records, deleting jobs, etc...).

tsondergaard
added a comment - 2014-11-17 06:01 The problem appears to be eliminated or at least significantly reduced by disabling the "Windows Search" indexing service. Also look out for anti-virus programs causing problems.
http://www.pcmag.com/slideshow_viewer/0,3253,l=251692&a=251692&po=4,00.asp

"Been there, done that"
In my experience, disabling Windows Search and anti-virus merely reduces the problem, e.g. down from a 5% failure rate to a 0.5% failure rate.
On all my windows build slaves, I've configured Windows Search to only search the start-menu, then disabled the search service entirely, I've configured the anti-virus to exclude the Jenkins build area from its scans and on-access checking, and I was still seeing builds fail every week due to transient file-locking problems.

pjdarton
added a comment - 2014-11-17 11:37 "Been there, done that"
In my experience, disabling Windows Search and anti-virus merely reduces the problem, e.g. down from a 5% failure rate to a 0.5% failure rate.
On all my windows build slaves, I've configured Windows Search to only search the start-menu, then disabled the search service entirely, I've configured the anti-virus to exclude the Jenkins build area from its scans and on-access checking, and I was still seeing builds fail every week due to transient file-locking problems.
After implementing the fix for this ( https://github.com/jenkinsci/jenkins/pull/1209 ) and applying it to my local Jenkins server, I haven't seen a single build fail due to these transient file-locking problems.

Update:
I split the code changes into a refactor of the unit-test code (to make it easier to test this), and the actual enhancement to the deletion code.
The refactor has been incorporated into Jenkins' core code already. The actual enhancement code changes are in https://github.com/jenkinsci/jenkins/pull/1800 and awaiting merge.

pjdarton
added a comment - 2015-10-19 10:16 Update:
I split the code changes into a refactor of the unit-test code (to make it easier to test this), and the actual enhancement to the deletion code.
The refactor has been incorporated into Jenkins' core code already. The actual enhancement code changes are in https://github.com/jenkinsci/jenkins/pull/1800 and awaiting merge.

We have a job to deploy a war file and start the service on windows build machine. However if one of the file opened in the destination directory by any of the process the files were unable to deploy and finally the build was successful. I would like to fail the build instead of success if build consisting of log as specified below. Is there any work around?

Log:
02:26:08 c:\resin-3.1.12\webapps\ROOT\WEB-INF\lib\ridl-3.2.1.jar - The process cannot access the file because it is being used by another process.
02:26:08 c:\resin-3.1.12\webapps\ROOT\WEB-INF\lib\unoil-3.2.1.jar - The process cannot access the file because it is being used by another process.

praveen kumar jogi
added a comment - 2016-05-02 17:41 - edited I would like to fail the job if it is unable to complete.
We have a job to deploy a war file and start the service on windows build machine. However if one of the file opened in the destination directory by any of the process the files were unable to deploy and finally the build was successful. I would like to fail the build instead of success if build consisting of log as specified below. Is there any work around?
Log:
02:26:08 c:\resin-3.1.12\webapps\ROOT\WEB-INF\lib\ridl-3.2.1.jar - The process cannot access the file because it is being used by another process.
02:26:08 c:\resin-3.1.12\webapps\ROOT\WEB-INF\lib\unoil-3.2.1.jar - The process cannot access the file because it is being used by another process.
Jenkins architecture:
Master 1.656 (linux)
couple of windows build slaves

Code changed in jenkins
User: Peter Darton
Path:
core/src/main/java/hudson/Util.java
core/src/test/java/hudson/UtilTest.javahttp://jenkins-ci.org/commit/jenkins/310c6747625a5e5605ac87c68d02eddaacdc8e0e
Log:
FIXED JENKINS-15331 by changing Util.deleteContentsRecursive, Util.deleteFile and Util.deleteRecursive so that they can retry failed deletions.
The number of deletion attempts and the time it waits between deletes are configurable via system properties (like hudson.Util.noSymlink etc).
Util.DELETION_MAX is set by -Dhudson.Util.deletionMax. Default is 3 attempts.
Util.WAIT_BETWEEN_DELETION_RETRIES is set by -Dhudson.Util.deletionRetryWait. Defaults is 100 milliseconds.
Util.GC_AFTER_FAILED_DELETE is set by -Dhudson.Util.performGCOnFailedDelete. Default is false.

SCM/JIRA link daemon
added a comment - 2016-05-03 23:32 Code changed in jenkins
User: Peter Darton
Path:
core/src/main/java/hudson/Util.java
core/src/test/java/hudson/UtilTest.java
http://jenkins-ci.org/commit/jenkins/310c6747625a5e5605ac87c68d02eddaacdc8e0e
Log:
FIXED JENKINS-15331 by changing Util.deleteContentsRecursive, Util.deleteFile and Util.deleteRecursive so that they can retry failed deletions.
The number of deletion attempts and the time it waits between deletes are configurable via system properties (like hudson.Util.noSymlink etc).
Util.DELETION_MAX is set by -Dhudson.Util.deletionMax. Default is 3 attempts.
Util.WAIT_BETWEEN_DELETION_RETRIES is set by -Dhudson.Util.deletionRetryWait. Defaults is 100 milliseconds.
Util.GC_AFTER_FAILED_DELETE is set by -Dhudson.Util.performGCOnFailedDelete. Default is false.
Added unit-tests for new functionality.

pjdarton
added a comment - 2016-05-19 10:09 Code changes are in Jenkins 2.2 onwards.
Parameters that control this functionality have been documented on https://wiki.jenkins-ci.org/display/JENKINS/Features+controlled+by+system+properties

The only reason this was logged as an "improvement" is because the fault really lies within the Windows OS / JRE and not within Jenkins itself, but all the symptoms (the issues that link to this) are bugs from an end-user's point of view - Jenkins builds "fail at random" on Windows (which is a bug), and this "improvement" is the cure.
i.e. For anyone trying to do builds on Windows, this is a bugfix (as evidenced by all the issues that link to this).

So, sure, this is an "improvement" - Jenkins now works reliably on Windows, and that's a huge improvement - but the reason I coded this was to fix a whole load of unreliability (aka "bugs") that are seen on Windows.

This was flagged as an lts-candidate, so I was rather hoping that it'd be backported to the LTS release.
As it stands now, either all Windows users have to upgrade to Jenkins 2, or they have to build their own LTS version (as I had to) ... or it gets included in the next LTS - You can probably guess which option I'm in favour of

pjdarton
added a comment - 2016-06-01 12:49 The only reason this was logged as an "improvement" is because the fault really lies within the Windows OS / JRE and not within Jenkins itself, but all the symptoms (the issues that link to this) are bugs from an end-user's point of view - Jenkins builds "fail at random" on Windows (which is a bug), and this "improvement" is the cure.
i.e. For anyone trying to do builds on Windows, this is a bugfix (as evidenced by all the issues that link to this).
So, sure, this is an "improvement" - Jenkins now works reliably on Windows, and that's a huge improvement - but the reason I coded this was to fix a whole load of unreliability (aka "bugs") that are seen on Windows.
This was flagged as an lts-candidate, so I was rather hoping that it'd be backported to the LTS release.
As it stands now, either all Windows users have to upgrade to Jenkins 2, or they have to build their own LTS version (as I had to) ... or it gets included in the next LTS - You can probably guess which option I'm in favour of

Jörg Ziegler
added a comment - 2016-06-01 12:58 Thanks pjdarton - this bug is pretty much killing our productivity as it requires manually restarting slaves every few hours. I strongly agree that it's more than an improvement.

Daniel Beck
added a comment - 2016-06-01 13:42 pjdarton Not my fault – Oliver Gondža filters for issue type and resolution, and anything that's not a fixed bug doesn't qualify, label or not.
This could have been corrected before the RC was published, by now it's too late for .3.

Actually we still can merge it to .3 if Oliver Gondža agrees. But I'm not so happy about it since RC is under testing now.
Regarding .4, it will unlikely happen according to the current release model. Needs a wide discussion in the developer list.

Oleg Nenashev
added a comment - 2016-06-01 14:03 Actually we still can merge it to .3 if Oliver Gondža agrees. But I'm not so happy about it since RC is under testing now.
Regarding .4, it will unlikely happen according to the current release model. Needs a wide discussion in the developer list.
BR, Oleg

Oliver Gondža
added a comment - 2016-06-08 08:11 I decided not to squeeze this into .3 (last in its line) for stability's sake. We need to be extra careful as we do not do much testing on windows, unfortunately.