Description

Since the SCM 2.0 release, I can't find a way, in a pipeline, to get a human-readable job & branch name.

For example, if I had a BitBucket team called 'Foo', a repository called 'Bar' and a branch 'feature/KEY-Baz', it used to be possible to get a nearly human readable branch - env.BRANCH_NAME would be 'Foo/Bar%2FKEY-Baz', which was easy enough to escape to a human readable form.

Now, however, it's a shortened form, with numbers in it - something like 'Foo/Bar.qxyz123/feature-KEY.abcd12345ghij-Baz'. And there appears to be no way to get the human readable version. This has broken all of my build notification templates.

[ Ack! I just realised something more important - I can't find related projects to do things such as copy artefacts from upstream builds, because the repository name itself now has a suffix. I'll be raising another bug for that.
]

This is also breaking my use of the S3BucketPublisher - It used to publish to S3 with a path prefix of (using the above) 'Foo/Bar/feature%2FBaz'.

It now publishes with 'Foo/Bar.qxyz123/feature-KEY.abcd12345ghij-Baz'. I could probably live with the suffix on the project name (though that sucks, it is at least consistent), but the mangling of the branch name is horrible, as it's non-deterministic.

At this point, I'm about to rollback to my backup and stick with the v1.0 version; notification templates were painful, but this is a problem that completely breaks my ops infrastructure.

Robert Watkins
added a comment - 2017-01-17 13:18 This is also breaking my use of the S3BucketPublisher - It used to publish to S3 with a path prefix of (using the above) 'Foo/Bar/feature%2FBaz'.
It now publishes with 'Foo/Bar.qxyz123/feature-KEY.abcd12345ghij-Baz'. I could probably live with the suffix on the project name (though that sucks, it is at least consistent), but the mangling of the branch name is horrible, as it's non-deterministic.
At this point, I'm about to rollback to my backup and stick with the v1.0 version; notification templates were painful, but this is a problem that completely breaks my ops infrastructure.

Robert Watkins
added a comment - 2017-01-17 13:24 I know the mangling of the paths is there to ensure uniqueness - but it's causing problems for me like crazy. An option to disable it would help a lot.

The S3BucketPublisher, BTW, uses run.getParent().getFullName(); to determine the prefix. If I can get an unmangled job name, I can do my own S3 uploads, but if I want to keep using this plugin, I need a way to make the parent's full name to be unmangled.

Robert Watkins
added a comment - 2017-01-17 20:18 The S3BucketPublisher, BTW, uses run.getParent().getFullName(); to determine the prefix. If I can get an unmangled job name, I can do my own S3 uploads, but if I want to keep using this plugin, I need a way to make the parent's full name to be unmangled.

Ryan Campbell
added a comment - 2017-01-17 20:27 The impact of this issue is that previously users expected to be able to construct the expected URL on their own, but now it is generated using a non-transparent algorithm.
Mic says he's had 10 people in the last 24 hours mentioned that they are surprised by the new URL scheme.
For example, here is a new URL:
https://ci.blueocean.io/blue/organizations/jenkins/blueocean/detail/feature-J.hunjes6l7b7p.om-github/5/pipeline/
Which is not love-able.
There is also external tooling which depends on this structure.

the job display name should be the unmangled name (unless the user has customized, but you cannot customize display name for multibranch)

We can expose the display name as an environment variable too

The Job name cannot stay as is currently.

% is not a valid character on some file systems.

.. is a valid branch name on some source control systems.

Even seemingly "safe" filenames like aux can cause issues for users.

There is a very limited set of characters that are safe for filesystems: A-Za-z0-9_.- is all there is to play with

Furthermore, the name length causes lots of issues. So we need to ensure that the branch name is not "too long"

The only way out of that with minimal risk of collisions is a deterministic hashing algorithm... which becomes ugly... or part name part hash with some characters used to disambiguate to prevent the smaller hash from causing issues.

The current algorithm was chosen to reduce the risk of collision to less than 1 in 33 million, which is about as low as we can go before things get ugly.

If somebody else has a wonderful magical scheme for mapping names from the entire unicode space in a predictable pattern into the character set A-Za-z0-9_.- that also makes care to ensure silly branch names like .. are not given the name .. and dangerous branch names like AUX, CON, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, PRN, NULL or their lower case equivalents are mapped into something sensible, and doesn't have risk of collisions (e.g. when JENKINS-36240 gets merged and then we can potentially start building "trusted" fork branches such that the branch name is now controlled by the fork)... oh and can we not actually change the job name from the branch name... and can we keep the job names to something short... well I'm all ears... but I think you will find it an impossible and thankless task.

What we need to do is expose the human readable bits correctly and ensure that plugins do correct things rather than blinding assuming things.

We probably also need to apply name mangling in core to all jobs... but that is a different story.

Blue Ocean could use getItemByBranch() to resolve branches if blue ocean wants to handle the case of branch names like .. other than that, jenkins core will still resolve the old urls

Stephen Connolly
added a comment - 2017-01-17 21:17 the job display name should be the unmangled name (unless the user has customized, but you cannot customize display name for multibranch)
We can expose the display name as an environment variable too
The Job name cannot stay as is currently.
% is not a valid character on some file systems.
.. is a valid branch name on some source control systems.
Even seemingly "safe" filenames like aux can cause issues for users.
There is a very limited set of characters that are safe for filesystems: A-Za-z0-9_.- is all there is to play with
Furthermore, the name length causes lots of issues. So we need to ensure that the branch name is not "too long"
The only way out of that with minimal risk of collisions is a deterministic hashing algorithm... which becomes ugly... or part name part hash with some characters used to disambiguate to prevent the smaller hash from causing issues.
The current algorithm was chosen to reduce the risk of collision to less than 1 in 33 million, which is about as low as we can go before things get ugly.
If somebody else has a wonderful magical scheme for mapping names from the entire unicode space in a predictable pattern into the character set A-Za-z0-9_.- that also makes care to ensure silly branch names like .. are not given the name .. and dangerous branch names like AUX, CON, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, PRN, NULL or their lower case equivalents are mapped into something sensible, and doesn't have risk of collisions (e.g. when JENKINS-36240 gets merged and then we can potentially start building "trusted" fork branches such that the branch name is now controlled by the fork)... oh and can we not actually change the job name from the branch name... and can we keep the job names to something short... well I'm all ears... but I think you will find it an impossible and thankless task.
What we need to do is expose the human readable bits correctly and ensure that plugins do correct things rather than blinding assuming things.
We probably also need to apply name mangling in core to all jobs... but that is a different story.
Blue Ocean could use getItemByBranch() to resolve branches if blue ocean wants to handle the case of branch names like .. other than that, jenkins core will still resolve the old urls

if people want we can increase the length of the hash but reduce the bits encoded by making those hashes "pronounceable"... but that limits the bit space that we have to play with. Base32 is what we use currently.

Stephen Connolly
added a comment - 2017-01-17 21:20 if people want we can increase the length of the hash but reduce the bits encoded by making those hashes "pronounceable"... but that limits the bit space that we have to play with. Base32 is what we use currently.

Is there a reason we can't have an option (disabled by default) to leave the job & branch name the way the user specified it? Because a lot of those problems aren't problems for a lot of users, a lot of the time, and can be worked around by selection of repository & branch name. Whilst mangling the job name can not be.

I understand that in environments like CloudBees that this will be a problem - but in a standard self-hosted development environment, we would be extremely unlikely to encounter these issues.

[Also, I believe you're unnecessarily conflating the Job name with the workspace name - the Job name isn't part of any filespace other than the web URL (and HTML escaping works fine for that)]

I've rolled back now, so I can't easily check, but from my recollection the env.JOB_DISPLAY_NAME didn't have the unmangled version - I did a dump of all of the environment properties, and not one was unmangled except for the BRANCH_NAME property. It's possible that the JOB_DISPLAY_NAME wasn't set yet, as I did the dump at the start of the job; I'll double-check that tonight.

Robert Watkins
added a comment - 2017-01-17 21:28 - edited Is there a reason we can't have an option (disabled by default) to leave the job & branch name the way the user specified it? Because a lot of those problems aren't problems for a lot of users, a lot of the time, and can be worked around by selection of repository & branch name. Whilst mangling the job name can not be.
I understand that in environments like CloudBees that this will be a problem - but in a standard self-hosted development environment, we would be extremely unlikely to encounter these issues.
[Also, I believe you're unnecessarily conflating the Job name with the workspace name - the Job name isn't part of any filespace other than the web URL (and HTML escaping works fine for that)]
I've rolled back now, so I can't easily check, but from my recollection the env.JOB_DISPLAY_NAME didn't have the unmangled version - I did a dump of all of the environment properties, and not one was unmangled except for the BRANCH_NAME property. It's possible that the JOB_DISPLAY_NAME wasn't set yet, as I did the dump at the start of the job; I'll double-check that tonight.

Robert Watkins it looks like this is going to be temporarily moved out of the main update center until this (and one other issue) are addressed and tested. For now it seems you are in an ok place after you restored?

Michael Neale
added a comment - 2017-01-17 21:55 Robert Watkins it looks like this is going to be temporarily moved out of the main update center until this (and one other issue) are addressed and tested. For now it seems you are in an ok place after you restored?

Stephen Connolly
added a comment - 2017-01-17 21:56 Check your JENKINS HOME:
JENKINS_HOME/jobs/${orgFolderName}/jobs/${mangledMultibranchProjectName}/branches/${mangledBranchName }
It's not the workspace name as that is already mangled (with an unnecessarily long hash that causes issues for windows builds but that is a different story)
In the above path the only user controlled names is orgFolderName
The other two names come from the SCMNaviagtor and the SCMSource respectively and we have no control over what names they give us back.
We already saw one issue with names containing Korean and Chinese characters on unix filesystems

It still sounds like what you need is to escape/transform/mangle the job name only when it comes to writing to the filesystem, though. Yes, transforming the job name is probably the easiest way to do that without changing the existing plugins, but it has serious knock-on impacts.

You seem to have a choice:

(optionally) transform the job name, then fix every plugin that really needs access to the untransformed job name, or

keep the job name as specified by the user, and only mangle it when needing to build a filesystem path.

Robert Watkins
added a comment - 2017-01-17 22:57 It still sounds like what you need is to escape/transform/mangle the job name only when it comes to writing to the filesystem, though. Yes, transforming the job name is probably the easiest way to do that without changing the existing plugins, but it has serious knock-on impacts.
You seem to have a choice:
(optionally) transform the job name, then fix every plugin that really needs access to the untransformed job name, or
keep the job name as specified by the user, and only mangle it when needing to build a filesystem path.

Michael Neale
added a comment - 2017-01-17 23:04 Robert Watkins it does need some mangling on the URI - so that a path like:
foo/bar/baz/boo
isn't ambiguous if the branch name is "foo/bar" (hence it used to have the simple escape)

You may (heck, probably do) need that if you're hosting a Jenkins server for use by others. You don't need that if you're hosting it for yourself - your chances of conflict are infinitesimal, and you can solve them if you get them. So putting up with a lot of hassle to work around this problem isn't worth it for most installations. (Though the simple escaping of the branch name was tolerable enough)

Robert Watkins
added a comment - 2017-01-17 23:11 You may (heck, probably do) need that if you're hosting a Jenkins server for use by others. You don't need that if you're hosting it for yourself - your chances of conflict are infinitesimal, and you can solve them if you get them. So putting up with a lot of hassle to work around this problem isn't worth it for most installations. (Though the simple escaping of the branch name was tolerable enough)

Michael Neale
added a comment - 2017-01-18 00:01 Robert Watkins not due to naming collisions but for routing. In any case I think a better scheme will result that fits the "human readable" requirement.

By changing cloudbees folders plugin we can have the on-disk file name != the url path segment name.

That way we can keep the on-disk filename using the current 2.0.x mangled names, but keep the urls using (mostly) the 1.x names...

I say (mostly) because we have to guard against a small subset of problematic names. Specifically:

"", "." and ".."

or any name that contains "/" / "?" / "#" / "["}} / {{"]"}} (plus for good measure because browsers try to be helpful, we will probably also have to guard against names containing "\"

All those names need to be "double" encoded. I'm hoping to use % encoding because that should retain the names mostly compatible with before

So a branch name like característica/nuevo 1 (~ spanish for "feature/new 1") will be given a "Job name" of característica%2Fnuevo 1 then all the nice friendly url encoding code will convert that name into the path segment caracter%C3%ADstica%252Fnuevo%201 which the browsers may "helpfully" display in the URL bar as característica%2Fnuevo 1 or característica%252Fnuevo%201 depending on how they feel

Stephen Connolly
added a comment - 2017-01-18 11:50 I think I have found a way...
By changing cloudbees folders plugin we can have the on-disk file name != the url path segment name.
That way we can keep the on-disk filename using the current 2.0.x mangled names, but keep the urls using (mostly) the 1.x names...
I say (mostly) because we have to guard against a small subset of problematic names. Specifically:
"" , "." and ".."
or any name that contains "/" / "?" / "#" / " ["}} / {{"] " }} (plus for good measure because browsers try to be helpful, we will probably also have to guard against names containing "\"
All those names need to be "double" encoded. I'm hoping to use % encoding because that should retain the names mostly compatible with before
So a branch name like característica/nuevo 1 (~ spanish for "feature/new 1") will be given a "Job name" of característica%2Fnuevo 1 then all the nice friendly url encoding code will convert that name into the path segment caracter%C3%ADstica%252Fnuevo%201 which the browsers may "helpfully" display in the URL bar as característica%2Fnuevo 1 or característica%252Fnuevo%201 depending on how they feel
Need to write tests and then do some manual testing

SCM/JIRA link daemon
added a comment - 2017-01-22 11:27 Code changed in jenkins
User: Stephen Connolly
Path:
src/test/java/com/cloudbees/hudson/plugins/folder/ChildNameGeneratorTest.java
http://jenkins-ci.org/commit/cloudbees-folder-plugin/93e075a54da18f01b3c1336304eb1ddeea5e8768
Log:
JENKINS-41124 Remove normalization left in by mistake when trying to make the test deterministic on all File systems
Unnecessary now that we detect the normalization scheme of the system under test

SCM/JIRA link daemon
added a comment - 2017-01-22 11:27 Code changed in jenkins
User: Stephen Connolly
Path:
src/test/java/com/cloudbees/hudson/plugins/folder/ChildNameGeneratorTest.java
src/test/resources/com/cloudbees/hudson/plugins/folder/ChildNameGeneratorTest/upgradeNFD.zip
http://jenkins-ci.org/commit/cloudbees-folder-plugin/ea7eb5d05fb9ad76775f42474be367492a5297dd
Log:
JENKINS-41124 Confirmed OS-X normalizes to NFC and Linux to NFD
So here is a second test set using NFD names so that if we end up on a magical filesystem that doesn't mess with the name encoding we will cover both variants.
With respect to the issue driving all of this:
MultiBranch projects already store the name of the branch in unmolested form within the Branch object, so we only need to worry about OrganizationFolder's children.
There are only two current OrganizationFolder navigators:
GitHub which helpfully replaces any non-url safe characters with `-`, so will not be an issue
BitBucket which also replaces any non-url safe characters with `-` (but without a tooltip giving advance notice), so will also not be an issue.

SCM/JIRA link daemon
added a comment - 2017-01-22 11:30 Code changed in jenkins
User: Stephen Connolly
Path:
pom.xml
src/main/java/jenkins/branch/Branch.java
src/main/java/jenkins/branch/MultiBranchProject.java
src/main/java/jenkins/branch/MultiBranchProjectDescriptor.java
src/main/java/jenkins/branch/NameEncoder.java
src/main/java/jenkins/branch/OrganizationFolder.java
src/test/java/integration/BrandingTest.java
src/test/java/integration/EventsTest.java
src/test/java/integration/MigrationTest.java
src/test/java/jenkins/branch/WorkspaceLocatorImplTest.java
http://jenkins-ci.org/commit/branch-api-plugin/1d254fad966168ad5dfff97c829fda458c917968
Log:
JENKINS-41124 Move name mangling support responsibility to AbstractFolder
Needs more tests... like way more...
[ ] Need a test that verifies after migration and then a restart that all is still as expected.
[ ] Need a test that verifies after a migration and then a reload that all is still as expected.
[ ] Need a test that verifies after a migration and thene a restart and then a reload that all is still as expected.
[ ] Need a set of all those migration tests using the migration from a 2.0.0 pristine set of items
[ ] Need a set of all those migration tests using the migration from a 1.x set of items that was migrated to 2.0.0 and then lands on this set

SCM/JIRA link daemon
added a comment - 2017-01-22 11:30 Code changed in jenkins
User: Stephen Connolly
Path:
src/main/java/jenkins/branch/MultiBranchProject.java
src/main/java/jenkins/branch/MultiBranchProjectDescriptor.java
src/main/java/jenkins/branch/OrganizationFolder.java
http://jenkins-ci.org/commit/branch-api-plugin/e59d1474cd8751668e9fd84edc85df4933801efe
Log:
JENKINS-41124 So it seems there are some bold side-effects going on in item constructors and we need to ensure those side effects take place in the correct directory