The -f flag, valid only when -u is set, forces get -u not to verify that each package has been checked out from the source control repository implied by its import path. This can be useful if the source is a local fork of the original.

However, for packages with a non-custom import path (custom import path being one that is defined by HTML meta tags, see isCustom), go get -u now behaves as if -f is always provided.

For custom import paths (e.g., rsc.io/pdf), there is no such issue.

What version of Go are you using (go version)?

$ go version
go version go1.8.1 darwin/amd64

What operating system and processor architecture are you using (go env)?

(For demonstration purposes, assume that https://github.com/wrong/repo contains arbitrary git history, such that it's git pull --ff-only-compatible with the repo I'm trying to update. I don't want to spend time creating GitHub repos that have this property, but it's possible.)

The proposed fix by @rsc was to relax things so that non-custom import path doesn't get a custom import path check unless it has an import comment. The issue was closed by CL 10693.

Later, #10952 was reopened as #16471, that suggested that the fix in CL 10693 was not optimal, and suggested a better fix. This new issue #16471 was then closed by CL 31658, which changed the behavior so that the remote URL check applied only to custom import paths.

This issue is essentially a reopen of both those issues. I think I have a suggested fix for the original problem that was reported in #10952. Let me explain the problem as I understand it.

Problem

Let's consider GitHub repos, since this is where the issue comes up most often.

GitHub supports cloning repos via HTTPS and SSH. The latter can be expressed as a normal URL (ssh://github.com/user/repo) or more commonly as a SCP-style address (git@github.com:user/repo).

A weird/unfortunate thing about GitHub repos is that the remote URL they're cloned from can have an optional ".git" suffix. In other words, these two remote URLs are equivalent:

git clone https://github.com/user/repo

git clone https://github.com/user/repo.git

When doing go get github.com/user/repo for the first time, cmd/go will use https://github.com/user/repo as the default remote URL (without suffix).

The change in CL 10693 effectively allowed using SSH URLs instead of HTTPS, by making remote URL checking apply only to packages with an import path comment (at the time). Later, it got changed in CL 31658 to apply to custom import paths only.

I think we should allow SSH URLs, but I believe there is a better way to do it without completely removing the remote URL check for non-custom import paths.

Proposed Fix

I think we should fix this issue, as well as the original issue from #10952, by applying the remote URL check to all repositories, but:

Ignore schema during URL comparison.

These three remote URLs should be considered equivalent and valid for import path github.com/user/repo:

If we do this, these URLs would be considered equivalent and valid for import path github.com/user/repo:

https://github.com/user/repo

https://github.com/user/repo.git

ssh://github.com/user/repo

ssh://github.com/user/repo.git

git@github.com:user/repo

git@github.com:user/repo.git

I would really really prefer NOT to do this step, and instead tell people not to use the unneeded ".git" suffix. go get doesn't set it, so it only happens when people modify the remote URL manually. I outlined my rationale for why I think this should not be done in shurcooL/gostatus#37 (comment).

However, I recognize that if we don't do this, some people may complain and report bugs (as has happened to me in my tools here, here, here). I don't want this fix to get reverted because of that.

When displaying error message about mismatching URL, format the expected repo URL using the schema and URL that the user has. Doing this is critical to support users who use non-default schema, otherwise the error message is confusing.

For example, the following error message is confusing because it's hard to see where the difference in URL is:

This comment has been minimized.

I agree with pretty much everything you've described here. 1 and 3 seem like net wins.

As far as 2:

Consider ignoring ".git" suffix for "github.com/xxx/yyy" remote URLs.

What about always ignoring .git suffixes? I know this is technically incorrect and could make the tool lax when it shouldn't be, but the tool is way more lax at the moment anyway. I don't think being lax here could cause any issues for people trying to publish two packages or two repos called foo and foo.git that were different, as that would make little sense to begin with.

I'm simply trying to keep the rules simpler and not dependant on hard-coding github.com and other well-known domains. Always ignoring .git would also make it work for self-hosted GitLab instances, for example.

This comment has been minimized.

edited

That is an option, but it seems worse than ignoring ".git" suffixes for "github.com/xxx/yyy" URLs only.

I know this is technically incorrect and could make the tool lax when it shouldn't be, but the tool is way more lax at the moment anyway.

I'd prefer to do things correctly. And this issue is about fixing the lax.

I'm simply trying to keep the rules simpler and not dependant on hard-coding github.com and other well-known domains.

The simplest solution is not to do anything special regarding ".git" suffixes.

When fetching new packages using go get, it doesn't add the ".git" suffix. So as long as the user doesn't manually modify the remote URL to be different, there shouldn't be a problem. Most people who I've seen doing this were doing it because they didn't know it was okay not to leave out ".git" suffix or that it was causing issues. Once they learned that, they fixed their remote URLs with agreement that it's best thing to do in long term (despite the annoyance of updating many repos).

I know that GitHub does not allow the repository name to end with ".git" at this time, so it's relatively safe to treat ".git" suffix as harmless for "github.com/xxx/yyy" URLs:

However, if you also ignore ".git" suffix for non-github.com URLs, that means it will apply to other VCSes, including Bazaar, Mercurial, and Subversion. It's possible some code host will not have such rule, so example-mercurial-host.com/user/repo and example-mercurial-host.com/user/repo.git may be very different repos.

If you still want not to verify that each package has been checked out from the source control repository implied by its import path, the -f flag lets you do that.

Always ignoring .git would also make it work for self-hosted GitLab instances, for example.

There shouldn't be any issues with doing go get or go get -u on packages on self-hosted GitLab instances. If there are, please report it as a separate issue.

This comment has been minimized.

I agree with you then - especially given how go get is supposed to work with repos that aren't git and that it already has the -f option.

I still think this could cause people reporting bugs like "go get fails due to .git suffix" on this issue tracker. Perhaps the error it gives in that case could be improved, to tell the user to either fix the remote URL or use -f?

This comment has been minimized.

edited

Now that that Go 1.10 tree is open, I'd like to begin working on this, if this gets approval. It's better to get this done sooner in the cycle rather than later, so that the fix can be evaluated and adjusted as neccessary before feature freeze.

Should I include the optional step 2 or skip it? Unless there are strong opinions, I'd prefer to leave it out initially and see how that goes. If there's a lot of complaints and this entire bug fix is at risk of being reverted, then I'll implement optional step 2 to resolve those complaints.

Ideally, I'd want to hear back from the people who worked on the previous versions of this issue. /cc @rsc@zombiezen

This comment has been minimized.

I'd kind of prefer not to do anything with this until package management is in a clearer state. It seems like there is no way to satisfy everyone here and making changes will just change which subset of people are satisfied.

This comment has been minimized.

edited

It seems like there is no way to satisfy everyone here and making changes will just change which subset of people are satisfied.

I take it you mean that this would negatively affect people who try to use a github fork as a remote, without changing the import path of the original package, right?

I was going to say that "I realize that security and convenience are often at odds with each other, but shouldn't we prioritize security over convenience anyway?", but after thinking more, I think it's important to consider the threat level and amount of inconvenience caused. In this case, I agree that the threat level is low, so I'm okay with waiting.

One of the reason I'm not worried about doing go get -u on various github.com/... packages myself is because I often use gostatus, which would warn me if the remote doesn't match the import path. If I didn't have that, I would feel slightly more nervous when doing go get -u knowing I effectively did go get -u -f instead due to this issue. But I think that it's quite hard to abuse this bug to attack someone, and people who care very strongly about it can rely on an external tool to verify that the remote URLs of their Go packages match the import path.

If/when you think it's a better time to revisit this, please let me know, I might want to work on it.

This comment has been minimized.

OK, sounds good. Once we have a decent package management story I think that will probably eliminate "automatic checkouts from version control systems" as a distribution mechanism, and with it many of these problems. (Of course we will retain the ability to work from checkouts if you do them yourself, but the more we get into problems with the subtleties here the more clear it is that version control systems are too complex to be the general default mechanism.)