When I manually removed the lock and repeated the checkout operation, it indeed took 11 minutes 15 seconds on the node where it failed.

The global timeout does work, so it's not a blocker anymore. It is, however, rather non-obvious configuration as the -Dorg.jenkinsci.plugins.gitclient.Git.timeOut=30 (or whatever sufficiently large value) option needs to be added to both JVM options of the master and JVM options of all slaves. The master options can only be configured in the servlet container and while the slave options can be configured in node settings (hidden out under "Advanced" button), slaves running as windows service don't take this into account without reinstalling the service.

Attachments

Issue Links

duplicates

JENKINS-37185Checkout timeout is not honored when used with local branch parameter

Resolved

JENKINS-20387git submodule update timeout value should be configurable per job

In the issue that there should be timeout it is mentioned that builds probably shouldn't be using it. Which I agree with; jobs can hang for many reasons. Also interrupting the clone causes additional damage, because often next time it fails because it remained locked.

Jan Hudec
added a comment - 2014-04-09 13:01 - edited In the issue that there should be timeout it is mentioned that builds probably shouldn't be using it. Which I agree with; jobs can hang for many reasons. Also interrupting the clone causes additional damage, because often next time it fails because it remained locked.

I'm not understanding your description, or I can't duplicate the bug you're describing.

I created a multi-configuration job which runs on a Windows machine and a Linux machine. The multi-configuration job is cloning a 3 GB git repository.

The Linux machine has a reference copy of the repository stored at a known location, and that reference location is included in the job definition. That allows the Linux clone to complete very quickly (much less than a minute to clone).

The Windows machine does not have a reference copy, so the clone takes much longer than the Linux machine. On my network, that clone seems to take as much as 2 minutes.

I set the clone timeout to 1 minute. The Linux clone completes in less than 1 minute and is successful. The Windows machine performs the clone for 1 minute and then is interrupted at 1 minute (as expected by the timeout setting). The clone timeout value set on the multi-configuration job was honored by the job running on the Windows slave.

Can you give more description about how you're configuring the longer timeout, or any other hints that may explain why I see timeout honored by the multi-configuration jobs and you do not see it being honored by multi-configuration jobs?

Mark Waite
added a comment - 2014-04-09 13:41 I'm not understanding your description, or I can't duplicate the bug you're describing.
I created a multi-configuration job which runs on a Windows machine and a Linux machine. The multi-configuration job is cloning a 3 GB git repository.
The Linux machine has a reference copy of the repository stored at a known location, and that reference location is included in the job definition. That allows the Linux clone to complete very quickly (much less than a minute to clone).
The Windows machine does not have a reference copy, so the clone takes much longer than the Linux machine. On my network, that clone seems to take as much as 2 minutes.
I set the clone timeout to 1 minute. The Linux clone completes in less than 1 minute and is successful. The Windows machine performs the clone for 1 minute and then is interrupted at 1 minute (as expected by the timeout setting). The clone timeout value set on the multi-configuration job was honored by the job running on the Windows slave.
Can you give more description about how you're configuring the longer timeout, or any other hints that may explain why I see timeout honored by the multi-configuration jobs and you do not see it being honored by multi-configuration jobs?

No reference copies involved. Well, I want to involve them, but I wanted to create them with a job.

The clone takes about an hour for me. It is a local network, but the server is a slow virtual. I am configuring timeout via the advanced clone behaviours option in project configuration. It uses native (msys) git and passes ssh credentials.

I have Jenkins 1.557 (it's always rather big pain to update as redeploy does not work correctly on the Windows glassfish), git-client-plugin 1.8.0 and git-plugin 2.2.0.

I had problems with configurations run on different node than master (with or without shallow) and problems with configurations with shallow clone selected even on master
while some builds on master seem to have passed.

Jan Hudec
added a comment - 2014-04-09 14:52 No reference copies involved. Well, I want to involve them, but I wanted to create them with a job.
The clone takes about an hour for me. It is a local network, but the server is a slow virtual. I am configuring timeout via the advanced clone behaviours option in project configuration. It uses native (msys) git and passes ssh credentials.
I have Jenkins 1.557 (it's always rather big pain to update as redeploy does not work correctly on the Windows glassfish), git-client-plugin 1.8.0 and git-plugin 2.2.0.
I had problems with configurations run on different node than master (with or without shallow) and problems with configurations with shallow clone selected even on master
while some builds on master seem to have passed.

I used the reference copy only as a way to assure that one of the multi-configuration jobs would complete before the timeout, while the other would exceed the timeout value.

The msysgit client has a known bandwidth limit that it can only transfer about 1 MB / second over the ssh transport. It is much faster over the git transport, and I believe it is also faster over the https transport. The msysgit port uses a very old version of OpenSSH that has that bandwidth limit. Unfortunately, updating the OpenSSH version inside the msysgit port is very difficult, so no one has made that change yet.

I still don't understand the difference between my configuration (where multi-configuration jobs honor the git timeout) and yours. Some of the differences you might try exploring include:

I used Linux, Windows 7 and Windows 8.1 as target operating systems, while yours seem to be Windows 8

I used a timeout less than the default 10 minutes, you use a timeout greater than the default 10 minutes

Mark Waite
added a comment - 2014-04-09 15:08 I used the reference copy only as a way to assure that one of the multi-configuration jobs would complete before the timeout, while the other would exceed the timeout value.
The msysgit client has a known bandwidth limit that it can only transfer about 1 MB / second over the ssh transport. It is much faster over the git transport, and I believe it is also faster over the https transport. The msysgit port uses a very old version of OpenSSH that has that bandwidth limit. Unfortunately, updating the OpenSSH version inside the msysgit port is very difficult, so no one has made that change yet.
I still don't understand the difference between my configuration (where multi-configuration jobs honor the git timeout) and yours. Some of the differences you might try exploring include:
I used Linux, Windows 7 and Windows 8.1 as target operating systems, while yours seem to be Windows 8
I used a timeout less than the default 10 minutes, you use a timeout greater than the default 10 minutes
I used a git protocol URL while yours is ssh
Can you upload the job definition file for further comparison?
Can you upload a log from the failed build?

Jan Hudec Jan, I can't duplicate the problem you've reported and I haven't seen any response from you on my request for more information. I intend to close this bug in a week as "Could not reproduce", unless more details from you can help reproduce the bug.

Mark Waite
added a comment - 2014-04-12 13:58 - edited Jan Hudec Jan, I can't duplicate the problem you've reported and I haven't seen any response from you on my request for more information. I intend to close this bug in a week as "Could not reproduce", unless more details from you can help reproduce the bug.

I finally got around to trying it again. The important point is that the operation that fails is checkout.

I created a job to check out a reference copy on each node and that worked for an hour and succeeded just fine the first time around. But I set it with sparse checkout of just a few files.

Then I've set up the actual job. It succeeded building some configurations, but not others. It has problems specifically on one slave node. It might be that it's disks are slower and the repository is so large that checking it out takes just over 10 minutes on that node and just under 10 minutes on the other or that that node happened to be more loaded or something.

It should also be noted, that killing the checkout operation leaves the lock behind, so Jenkins won't recover from this without serious manual surgery.

Below is relevant log. Note, that the build took 26 minutes in total, but it clearly says it timed out after 10 minutes, so the limit was not applied cumulatively. It is possible, that the checkout command alone indeed took 10 minutes.

Jan Hudec
added a comment - 2014-06-24 14:37 - edited I finally got around to trying it again. The important point is that the operation that fails is checkout .
I created a job to check out a reference copy on each node and that worked for an hour and succeeded just fine the first time around. But I set it with sparse checkout of just a few files.
Then I've set up the actual job. It succeeded building some configurations, but not others. It has problems specifically on one slave node. It might be that it's disks are slower and the repository is so large that checking it out takes just over 10 minutes on that node and just under 10 minutes on the other or that that node happened to be more loaded or something.
It should also be noted, that killing the checkout operation leaves the lock behind, so Jenkins won't recover from this without serious manual surgery.
Below is relevant log. Note, that the build took 26 minutes in total, but it clearly says it timed out after 10 minutes, so the limit was not applied cumulatively. It is possible, that the checkout command alone indeed took 10 minutes.
Building remotely on Win8-builder (various labels) in workspace D:\Jenkins\workspace\Project\LABEL\Android
Cloning the remote Git repository
Using shallow clone
Cloning repository git@git.company.com:project
> git init D:\Jenkins\workspace\Project\LABEL\Android\src
Fetching upstream changes from git@git.company.com:project
> git --version
using GIT_SSH to set credentials jenkins key for git
> git fetch --tags --progress git@git.company.com:project +refs/heads/*:refs/remotes/origin/* --depth=1
> git config remote.origin.url git@git.company.com:project
> git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/*
> git config remote.origin.url git@git.company.com:project
Pruning obsolete local branches
Fetching upstream changes from git@git.company.com:project
using GIT_SSH to set credentials jenkins key for git
> git fetch --tags --progress git@git.company.com:project +refs/heads/*:refs/remotes/origin/* --prune
Checking out Revision 159bc2b21669bc7b5217341fc8de9cd6b48439b2 (origin/dev/jan.hudec/pu)
> git config core.sparsecheckout
> git checkout -f 159bc2b21669bc7b5217341fc8de9cd6b48439b2
ERROR: Timeout after 10 minutes
FATAL: Could not checkout null with start point 159bc2b21669bc7b5217341fc8de9cd6b48439b2
hudson.plugins.git.GitException: Could not checkout null with start point 159bc2b21669bc7b5217341fc8de9cd6b48439b2
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1479)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:63)
at java.lang.Thread.run(Unknown Source)
Caused by: hudson.plugins.git.GitException: Command "git checkout -f 159bc2b21669bc7b5217341fc8de9cd6b48439b2" returned status code -1:
stdout:
stderr:
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1307)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1283)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1279)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1084)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1094)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1474)
... 11 more

Jan Hudec
added a comment - 2014-06-24 17:05 Indeed, when looking at the source ( https://github.com/jenkinsci/git-client-plugin/blob/master/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java ), the configurable timeout only applies to commands launched via launchCommandWithCredentials , but local commands including checkout are launched via launchCommand which just uses the TIMEOUT constant.

The org.jenkinsci.plugins.gitclient.CliGitAPIImpl.TIMEOUT global does apply and so should the `org.jenkinsci.plugins.gitclient.Git.timeOut` global property. Unfortunately it's been changed to final between it's introduction and current version, so I can't easily test whether it works using script console. Mind you, one can't restart production build server on a whim.

Jan Hudec
added a comment - 2014-06-24 17:41 The org.jenkinsci.plugins.gitclient.CliGitAPIImpl.TIMEOUT global does apply and so should the `org.jenkinsci.plugins.gitclient.Git.timeOut` global property. Unfortunately it's been changed to final between it's introduction and current version, so I can't easily test whether it works using script console. Mind you, one can't restart production build server on a whim.

So I restarted the server. It now has correct value on the master. But getting it to the appropriate slave was rather complicated. The jvm options in node config didn't have effect on the web-start-installed-service (would need to reinstall it, probably). Only editing the jenkins-slave.xml worked.

Jan Hudec
added a comment - 2014-06-25 05:49 - edited So I restarted the server. It now has correct value on the master. But getting it to the appropriate slave was rather complicated. The jvm options in node config didn't have effect on the web-start-installed-service (would need to reinstall it, probably). Only editing the jenkins-slave.xml worked.

Jan Hudec
added a comment - 2014-06-25 09:02 Finally managed to update the global timeout. It appears to fix the issue, so I downgraded the severity, but it is rather inconvenient. At the very least it needs to be prominently documented.

I am also seeing this behavior. The clone operation completes quickly (I am using a reference copy since my repository is over 3gb in size), but the checkout tends to take 15-20 minutes and that is what fails. It is a blocker for me to upgrade as the workaround of setting a global timeout is not practical in my environment.

We have about 20 repositories and over 30 unix and Windows slaves. Most of our repositories are small and the ten-minute timeout is sufficient. But we have three large legacy projects and for those projects only we need to increase or eliminate the timeout.

Don Ross
added a comment - 2014-07-15 14:31 I am also seeing this behavior. The clone operation completes quickly (I am using a reference copy since my repository is over 3gb in size), but the checkout tends to take 15-20 minutes and that is what fails. It is a blocker for me to upgrade as the workaround of setting a global timeout is not practical in my environment.
We have about 20 repositories and over 30 unix and Windows slaves. Most of our repositories are small and the ten-minute timeout is sufficient. But we have three large legacy projects and for those projects only we need to increase or eliminate the timeout.

Another work around which may work for some users is to use sparse checkout and only checkout the subtree which is actually used. If the checkout of the subtree you need can be completed in less than 10 minutes, then sparse checkout would be a solution.

Add the "Additional Behaviours" - "Sparse Checkout paths", then list the directories needed. The plugin will copy the entire repository, but only checkout the directories specified by the sparse checkout.

Mark Waite
added a comment - 2014-07-17 12:22 Another work around which may work for some users is to use sparse checkout and only checkout the subtree which is actually used. If the checkout of the subtree you need can be completed in less than 10 minutes, then sparse checkout would be a solution.
Add the "Additional Behaviours" - "Sparse Checkout paths", then list the directories needed. The plugin will copy the entire repository, but only checkout the directories specified by the sparse checkout.

The next git-client-plugin release after 1.10.0 will include the API support (and unit tests) for timeout on the checkout command. That is necessary to add timeout on checkout, but it is not sufficient.

I hope to prepare a merge request for the git plugin which will provide the user interface necessary to access that API. While reviewing the git-plugin, it looks like the simplest approach will be to add a new "Additional Behaviour" for "Advanced checkout behaviours". The initial implementation would contain a single field for the user provided value of the checkout timeout in minutes.

Mark Waite
added a comment - 2014-07-28 04:09 The next git-client-plugin release after 1.10.0 will include the API support (and unit tests) for timeout on the checkout command. That is necessary to add timeout on checkout, but it is not sufficient.
I hope to prepare a merge request for the git plugin which will provide the user interface necessary to access that API. While reviewing the git-plugin, it looks like the simplest approach will be to add a new "Additional Behaviour" for "Advanced checkout behaviours". The initial implementation would contain a single field for the user provided value of the checkout timeout in minutes.

Mark Waite
added a comment - 2014-09-05 12:22 Use -Dorg.jenkinsci.plugins.gitclient.CliGitAPIImpl.TIMEOUT=30 from the command line which starts a slave agent, and from the command line which starts the Jenkins master.

Setting a Jenkins startup property requires command line access. The TIMEOUT variable is immutable, defined at process startup, and cannot be changed after that. Without command line access, you can't change that variable.

With the changes included in git plugin 2.2.3 and later, checkout timeout can now be set from the user interface. The clone timeout has been adjustable for a long time.

The only operation requested in this bug report which can't yet have its timeout adjusted is the "git clean" operation. If clean is timing out for you, then you could instead uncheck the "Clean before checkout" and "Clean after checkout" boxes, and place a first build step "git clean -xfd" or "git clean -xffd" if you use submodules. Command performed as part of build steps have no timeout.

Mark Waite
added a comment - 2014-09-05 12:37 Setting a Jenkins startup property requires command line access. The TIMEOUT variable is immutable, defined at process startup, and cannot be changed after that. Without command line access, you can't change that variable.
With the changes included in git plugin 2.2.3 and later, checkout timeout can now be set from the user interface. The clone timeout has been adjustable for a long time.
The only operation requested in this bug report which can't yet have its timeout adjusted is the "git clean" operation. If clean is timing out for you, then you could instead uncheck the "Clean before checkout" and "Clean after checkout" boxes, and place a first build step "git clean -xfd" or "git clean -xffd" if you use submodules. Command performed as part of build steps have no timeout.

Paulo Matos
added a comment - 2014-09-05 13:41 @Mark Waite: Thanks. I didn't notice the clone and checkout timeouts were different so I was setting the checkout timeout and thinking it was strange the timeout while cloning was still set to 10mins.
It's working now.

I don't like to be the devils one, but I rehit this issue in 3.0.0-beta2 and some versions before (at least the 2.5 beta).
This is a multibranch pipeline project.
Clone as well as checkout and submodule timeout are set to 45 minutes

Quentin Dufour
added a comment - 2016-07-13 15:03 Same problem as Mark Waite .
I've set Checkout and Clone timeout to 120 minutes.
It fails on the checkout command with timeout=10
I use Jenkins Pipeline with Jenkins 2.11

Refer to pull request 423 for a proposed fix. The pull request still needs further code review and investigaton into the history to understand why I missed this regression during reviews of earlier pull requests.

Mark Waite
added a comment - 2016-07-18 11:38 Refer to pull request 423 for a proposed fix. The pull request still needs further code review and investigaton into the history to understand why I missed this regression during reviews of earlier pull requests.

After some investigation, I've fixed my problem by removing git lfs filters.
Now, I'm manually calling "git lfs pull" in my build after the checkout step.
Indeed, without git lfs this step is very fast and should not take too much time.
But that's during this step that git lfs try to download its files.

It might be the reason why you didn't need to set the timeout on this step during your development.

Quentin Dufour
added a comment - 2016-07-18 14:50 After some investigation, I've fixed my problem by removing git lfs filters.
Now, I'm manually calling "git lfs pull" in my build after the checkout step.
Indeed, without git lfs this step is very fast and should not take too much time.
But that's during this step that git lfs try to download its files.
It might be the reason why you didn't need to set the timeout on this step during your development.

Quentin Dufour has shown that the fix that I inserted into 2.5.3 was incomplete. He's also detected that the test I wrote to detect the problem is hardly testing the problem at all. Special thanks to Quentin Dufour!

He's working on a correct and complete fix as part of his pull request to the git client plugin.

The partial fix has not been included in any of the beta releases yet. I'd prefer to have a complete fix and then release in both the main line and the beta releases.

Mark Waite
added a comment - 2016-08-05 11:54 - edited Quentin Dufour has shown that the fix that I inserted into 2.5.3 was incomplete. He's also detected that the test I wrote to detect the problem is hardly testing the problem at all. Special thanks to Quentin Dufour !
He's working on a correct and complete fix as part of his pull request to the git client plugin.
The partial fix has not been included in any of the beta releases yet. I'd prefer to have a complete fix and then release in both the main line and the beta releases.

Mark Waite : I am using git plugin 3.0.1 & Git client plugin 2.2.0 and could reproduce the timeout issue with sparse checkout. I tried setting up the timeout in Additional Behaviors >> Timeout=120, Still its not working find below the Jenkins output. Let me know If I am missing anything here.

I tried the `scm checkout` syntax you reference in your first comment and I see the 10 minute timeout just the same as you were describing.

Additionally, how did you confirm the "advanced behaviour" was working? Will the log messages begin to display the configured timeout value, or do I just have to test it and hope my network is running slow enough to validate the test?

Luke Lussenden
added a comment - 2017-09-18 16:21 Oded Arbel can you provide an example or link of what you mean by advanced clone behaviour? I'm still getting the 10 minute timeout using the following:
git branch: '$BRANCH' ,
credentialsId: 'XXXX' ,
url: 'ssh: //git@example.com/myRepo.git' ,
extensions: [[$class: 'CheckoutOption' , timeout: 100]]
I tried the `scm checkout` syntax you reference in your first comment and I see the 10 minute timeout just the same as you were describing.
Additionally, how did you confirm the "advanced behaviour" was working? Will the log messages begin to display the configured timeout value, or do I just have to test it and hope my network is running slow enough to validate the test?

It is not enough to say "the same issue occurring". You'll need to provide much more context than "same issue occurring". What have you tried? What is the context where timeouts are not behaving as you expect? What is the log content when the timeout does not behave as you expect? What job type are you using?

Please gather those details and submit a new bug report, rather than reopening this report.

There are many, many users that are successfully using extended timeouts to clone and checkout large repositories. I've presented talks at Jenkins World 2016, Jenkins World 2017,at a 2016 online meetup and at a 2017 online meetup that describe techniques to better manage large repositories. All of those talks depend on adjusting timeout values as needed, and they work.

Mark Waite
added a comment - 2017-10-12 11:30 Vadivel Natarajan the bug was fixed over 2 years ago. and is as described by Tim Knight in the comment above your comment that you're reopening the bug.
It is not enough to say "the same issue occurring". You'll need to provide much more context than "same issue occurring". What have you tried? What is the context where timeouts are not behaving as you expect? What is the log content when the timeout does not behave as you expect? What job type are you using?
Please gather those details and submit a new bug report, rather than reopening this report.
There are many, many users that are successfully using extended timeouts to clone and checkout large repositories. I've presented talks at Jenkins World 2016 , Jenkins World 2017 ,at a 2016 online meetup and at a 2017 online meetup that describe techniques to better manage large repositories. All of those talks depend on adjusting timeout values as needed, and they work.
In addition to those resources, CloudBees support has provided detailed instructions for configuring a reference repository to speed clone operations.

Yarden Bar
added a comment - 2018-11-08 12:04 Hi,
We encountered this timeout as well, but we suspect that the git config core.sparsecheckout with out explicit true argument causes this.
Adding the CloneOption with timeout didn't help in our case.
Is there a way to configure git config core.sparsecheckout true in the checkout step? I've read the documentation but coudn't figure out how to do so...
Thank you,
Yarden

Yarden Bar it is generally considered bad form to ask an unrelated question in a bug report. It clutters the bug reports and wastes the time of maintainers that are notified when the unrelated comment is added to the bug report. Please don't do that in the future. Use the mailing list or chat questions and answers so that more than a few people are notified and might be able to assist.

The question interested me enough that I added it to my JENKINS-52746 test case. Refer to that example for details that will allow you to use a sparse checkout path definition in a declarative Pipeline checkout statement.

If you are using declarative Pipeline, you may also need to `skipDefaultCheckout(true)` in the options section, otherwise the full repository checkout happens implicitly before the first step.

Mark Waite
added a comment - 2018-11-08 12:28 Yarden Bar it is generally considered bad form to ask an unrelated question in a bug report. It clutters the bug reports and wastes the time of maintainers that are notified when the unrelated comment is added to the bug report. Please don't do that in the future. Use the mailing list or chat questions and answers so that more than a few people are notified and might be able to assist.
The question interested me enough that I added it to my JENKINS-52746 test case . Refer to that example for details that will allow you to use a sparse checkout path definition in a declarative Pipeline checkout statement.
If you are using declarative Pipeline, you may also need to `skipDefaultCheckout(true)` in the options section, otherwise the full repository checkout happens implicitly before the first step.