[PATCH v2 1/4] eclass/go-module: add support for building based on go.sum

[PATCH v2 1/4] eclass/go-module: add support for building based on go.sum

EGO_SUM mode now supplements the existing EGO_VENDOR mode.

EGO_SUM should be populated by the maintainer, directly from the go.sum
file of the root package. See eclass and conversion examples for further
details: dev-go/go-tour, app-admin/kube-bench, dev-vcs/cli

The go-module_set_globals function performs validation of
inputs and dies on fatal errors.

diff --git eclass/go-module.eclass eclass/go-module.eclass
index 80ff2902b3ad..50aff92735af 100644
--- eclass/go-module.eclass
+++ eclass/go-module.eclass
@@ -4,22 +4,45 @@
# @ECLASS: go-module.eclass
# @MAINTAINER:
# William Hubbs <[hidden email]>
+# @AUTHOR:
+# William Hubbs <[hidden email]>
+# Robin H. Johnson <[hidden email]>
# @SUPPORTED_EAPIS: 7
# @BLURB: basic eclass for building software written as go modules
# @DESCRIPTION:
-# This eclass provides basic settings and functions
-# needed by all software written in the go programming language that uses
-# go modules.
+# This eclass provides basic settings and functions needed by all software
+# written in the go programming language that uses go modules.
+#
+# You might know the software you are packaging uses modules because
+# it has files named go.sum and go.mod in its top-level source directory.
+# If it does not have these files, try use the golang-* eclasses FIRST!
+# There ARE legacy Golang packages that use external modules with none of
+# go.mod, go.sum, vendor/ that can use this eclass regardless.
+#
+# Guidelines for usage:
+# "vendor/":
+# - pre-vendored package. Do NOT set EGO_SUM or EGO_VENDOR.
#
-# You will know the software you are packaging uses modules because
-# it will have files named go.sum and go.mod in its top-level source
-# directory. If it does not have these files, use the golang-* eclasses.
+# "go.mod" && "go.sum":
+# - Populate EGO_SUM with entries from go.sum
+# - Append license:${GENTOO_LICENSE} to any modules needed at runtime.
+# dev-go/golicense can tell you which modules in a Golang binary are used at
+# runtime (requires network connectivity).
#
-# If it has these files and a directory named vendor in its top-level
-# source directory, you only need to inherit the eclass since upstream
-# is vendoring the dependencies.
+# None of the above:
+# - Did you try golang-* eclasses first? Upstream has undeclared dependencies
+# (perhaps really old source). You can use either EGO_SUM or EGO_VENDOR.
+
+#
+# If it has these files AND a directory named "vendor" in its top-level source
+# directory, you only need to inherit the eclass since upstream has already
+# vendored the dependencies.
+
+# If it does not have a vendor directory, you should use the EGO_SUM
+# variable and the go-module_gosum_uris function as shown in the
+# example below to handle dependencies.
#
-# If it does not have a vendor directory, you should use the EGO_VENDOR
+# Alternatively, older versions of this eclass used the EGO_VENDOR
# variable and the go-module_vendor_uris function as shown in the
# example below to handle dependencies.
#
@@ -28,6 +51,28 @@
# dependencies. So please make sure it is accurate.
#
# @EXAMPLE:
+# @CODE
+#
+# inherit go-module
+#
+# EGO_SUM=(
+# "github.com/aybabtme/rgbterm v0.0.0-20170906152045-cc83f3b3ce59/go.mod h1:q/89r3U2H7sSsE2t6Kca0lfwTK8JdoNGS/yzM/4iH5I= license:BSD-2,MIT"
+# "github.com/aybabtme/rgbterm v0.0.0-20170906152045-cc83f3b3ce59 h1:WWB576BN5zNSZc/M9d/10pqEx5VHNhaQ/yOVAkmj5Yo="
+# )
+# S="${WORKDIR}/${MY_P}"
+# go-module_set_globals
+#
+# SRC_URI="https://github.com/example/${PN}/archive/v${PV}.tar.gz -> ${P}.tar.gz
+# ${EGO_SUM_SRC_URI}"
+#
+# LICENSE="some-license ${EGO_SUM_LICENSES}"
+#
+# src_unpack() {
+# unpack ${P}.tar.gz
+# go-module_src_unpack
+# }
+#
+# @CODE
#
# @CODE
#
@@ -35,7 +80,7 @@
#
# EGO_VENDOR=(
# "github.com/xenolf/lego 6cac0ea7d8b28c889f709ec7fa92e92b82f490dd"
-# "golang.org/x/crypto 453249f01cfeb54c3d549ddb75ff152ca243f9d8 github.com/golang/crypto"
+# "golang.org/x/crypto 453249f01cfeb54c3d549ddb75ff152ca243f9d8 github.com/golang/crypto"
# )
#
# SRC_URI="https://github.com/example/${PN}/archive/v${PV}.tar.gz -> ${P}.tar.gz
@@ -64,10 +109,12 @@ export GO111MODULE=on
export GOCACHE="${T}/go-build"

# The following go flags should be used for all builds.
-# -mod=vendor stopps downloading of dependencies from the internet.
# -v prints the names of packages as they are compiled
# -x prints commands as they are executed
-export GOFLAGS="-mod=vendor -v -x"
+# -mod=vendor use the vendor directory instead of downloading dependencies
+# -mod=readonly do not update go.mod/go.sum but fail if updates are needed
+export GOFLAGS="-v -x -mod=readonly"
+[[ ${#EGO_VENDOR[@]} -gt 0 ]] && GOFLAGS+=" -mod=vendor"

# Do not complain about CFLAGS etc since go projects do not use them.
QA_FLAGS_IGNORED='.*'
@@ -77,6 +124,42 @@ RESTRICT="strip"

Re: [PATCH v2 1/4] eclass/go-module: add support for building based on go.sum

> EGO_SUM mode now supplements the existing EGO_VENDOR mode.
>
> EGO_SUM should be populated by the maintainer, directly from the go.sum
> file of the root package. See eclass and conversion examples for further
> details: dev-go/go-tour, app-admin/kube-bench, dev-vcs/cli
>
> The go-module_set_globals function performs validation of
> inputs and dies on fatal errors.
>
> Signed-off-by: Robin H. Johnson <[hidden email]>
> ---
> eclass/go-module.eclass | 419 +++++++++++++++++++++++++++++++++++--
> profiles/thirdpartymirrors | 1 +
> 2 files changed, 397 insertions(+), 23 deletions(-)
>
> diff --git eclass/go-module.eclass eclass/go-module.eclass
> index 80ff2902b3ad..50aff92735af 100644
> --- eclass/go-module.eclass
> +++ eclass/go-module.eclass
> @@ -4,22 +4,45 @@
> # @ECLASS: go-module.eclass
> # @MAINTAINER:
> # William Hubbs <[hidden email]>
> +# @AUTHOR:
> +# William Hubbs <[hidden email]>
> +# Robin H. Johnson <[hidden email]>
> # @SUPPORTED_EAPIS: 7
> # @BLURB: basic eclass for building software written as go modules
> # @DESCRIPTION:
> -# This eclass provides basic settings and functions
> -# needed by all software written in the go programming language that uses
> -# go modules.
> +# This eclass provides basic settings and functions needed by all software
> +# written in the go programming language that uses go modules.
> +#
> +# You might know the software you are packaging uses modules because
> +# it has files named go.sum and go.mod in its top-level source directory.
> +# If it does not have these files, try use the golang-* eclasses FIRST!
> +# There ARE legacy Golang packages that use external modules with none of
> +# go.mod, go.sum, vendor/ that can use this eclass regardless.
> +#
> +# Guidelines for usage:
> +# "vendor/":
> +# - pre-vendored package. Do NOT set EGO_SUM or EGO_VENDOR.
> #
> -# You will know the software you are packaging uses modules because
> -# it will have files named go.sum and go.mod in its top-level source
> -# directory. If it does not have these files, use the golang-* eclasses.
> +# "go.mod" && "go.sum":
> +# - Populate EGO_SUM with entries from go.sum
> +# - Append license:${GENTOO_LICENSE} to any modules needed at runtime.
> +# dev-go/golicense can tell you which modules in a Golang binary are used at
> +# runtime (requires network connectivity).
> #
> -# If it has these files and a directory named vendor in its top-level
> -# source directory, you only need to inherit the eclass since upstream
> -# is vendoring the dependencies.
> +# None of the above:
> +# - Did you try golang-* eclasses first? Upstream has undeclared dependencies
> +# (perhaps really old source). You can use either EGO_SUM or EGO_VENDOR.
> +
> +#
> +# If it has these files AND a directory named "vendor" in its top-level source
> +# directory, you only need to inherit the eclass since upstream has already
> +# vendored the dependencies.
> +
> +# If it does not have a vendor directory, you should use the EGO_SUM
> +# variable and the go-module_gosum_uris function as shown in the
> +# example below to handle dependencies.
> #
> -# If it does not have a vendor directory, you should use the EGO_VENDOR
> +# Alternatively, older versions of this eclass used the EGO_VENDOR
> # variable and the go-module_vendor_uris function as shown in the
> # example below to handle dependencies.

I think we can remove the example with EGO_VENDOR and
go-module_vendor_uris; we really don't want people to continue following
that example.

Re: [PATCH v2 1/4] eclass/go-module: add support for building based on go.sum

On Tue, Feb 18, 2020 at 11:46:45PM -0600, William Hubbs wrote:
> > -# If it does not have a vendor directory, you should use the EGO_VENDOR
> > +# Alternatively, older versions of this eclass used the EGO_VENDOR
> > # variable and the go-module_vendor_uris function as shown in the
> > # example below to handle dependencies.
> I think we can remove the example with EGO_VENDOR and
> go-module_vendor_uris; we really don't want people to continue following
> that example.
I tried to handle more cases here, but now I'm wondering if it would be
cleaner just to put all of new way into a distinct eclass, and sunset
the old eclass entirely. I found unforeseen interactions, see below.

> > +# S="${WORKDIR}/${MY_P}"
> The default setting of S should be fine for most ebuilds, so I don't
> think we need this in the example.
I'd copied it, but yes in this case.

The go.sum unpack only applies special handling to distfiles that it
knows about. It does NOT process any other distfiles at all.

EAPI8 or future Portage improvements might have annotations to disable
any automatic unpacking for specific distfiles, which would resolve this
issue.

Hence, you need to explicitly unpack any distfiles that are NOT part of
the go.sum dependencies. There are some ebuilds that do unpack & rename
in src_unpack already, so they need extra care as well.

The EGO_VENDOR src_unpack unpacked EVERYTHING, so it didn't have this
issue.

>
> > +# The extra metadata keys accepted at this time are:
> > +# - license: for dependencies built into the final runtime, the value field is
> > +# a comma seperated list of Gentoo licenses to apply to the LICENSE variable,
> > +#
> There are two lines for each module in go.sum, the one with /go.mod as a
> suffix to the version and the one without. We do not need both right?
This is not entirely correct, and it's the warnings from golang upstream
about some hidden complexity in the /go.mod that lead me to list both of
them.

If we intend to verify the h1: in future, then we need to list all
/go.mod entries explicitly, so have somewhere to put the h1: hash.
If you're verifying the h1: hash, you must verify it on the
{version}.mod ALWAYS, and if the {version}.zip is present, then also on
that file (otherwise it could sneak in some evil metadata via the
{version}.mod).

If we omit h1: entirely, then we can get away with listing ONE line in
EGO_SUM for each dependency.
- If it contains /go.mod, it will fetch ONLY the {version}.mod file.
- If it does not contain /go.mod, it will fetch the {version}.mod file
AND the {version}.zip file

> > +# @EXAMPLE:
> > +# # github.com/BurntSushi/toml is a build-time only dep
> > +# # github.com/aybabtme/rgbterm is a runtime dep, annotated with licenses
>
> I'm not sure we can distinguish between buildtime only and runtime deps.
The 'golicense' tool will take a Golang binary and print out all of the
Golang modules that got linked into it. I consider those to be the
runtime deps in this case.

> > +# @ECLASS-VARIABLE: _GOMODULE_GOPROXY_BASEURI
...
> > +# This variable should NOT be present in user-level configuration e.g.
> > +# /etc/portage/make.conf, as it will violate metadata immutability!
> > +: "${_GOMODULE_GOPROXY_BASEURI:=mirror://goproxy/}"
>
> If this isn't supposed to be in user-level configuration, where should
> it be set?
Maybe I'll just prefix it with 'readonly' and force the value for now.

> > # @FUNCTION: go-module_src_unpack
> > # @DESCRIPTION:
> > +# Extract & configure Go modules for consumpations.
> > +# - Modules listed in EGO_SUM are configured as a local GOPROXY via symlinks (fast!)
> > +# - Modules listed in EGO_VENDOR are extracted to "${S}/vendor" (slow)
> > +#
> > +# This function does NOT unpack the base distfile of a Go-based package.
> > +# While the entries in EGO_SUM will be listed in ${A}, they should NOT be
> > +# unpacked, Go will directly consume the files, including zips.
> > +go-module_src_unpack() {
>
> If possible, this function should unpack the base distfile. That would
> keep us from having to write src_unpack for every go ebuild that uses
> the eclass.

That's fine until we get to multiple base distfiles and handling them.

Maybe pass a flag to go-module_src_unpack to tell it not to unpack any
distfile that it does not explicitly know about?

> > + die "Neither EGO_SUM nor EGO_VENDOR are set!"
> This shouldn't die, having neither one set is valid.
Yes, I caught this in later testing: a Golang package in the tree that
inherit go-module, but didn't use EGO_VENDOR, EGO_SUM or have a vendor
directory! This is another reason why I think bumping the eclass to a
new name would be safer now.

>> ...
> If EGO_SUM is up to date, this could also mean that upstream forgot to
> run go mod tidy and commit the go.mod/go.sum before the release, so it
> could be a bug that needs to be reported.
Yes. I found at least one package where the upstream had failed to run
go mod tidy in the release: I already reported it, and they say they
will include it in the next release, not doing a release just for it.

>
> > +DESCRIPTION="Kubernetes Bench for Security runs the CIS Kubernetes Benchmark"
> > +HOMEPAGE="https://github.com/aquasecurity/kube-bench"
> > +
> > +EGO_SUM=(
> > + "cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw="
> > + "cloud.google.com/go v0.34.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw="
> > + "cloud.google.com/go v0.37.4 h1:glPeL3BQJsbF6aIIYfZizMwc5LTYz250bDMjttbBGAU="
> > + "cloud.google.com/go v0.37.4/go.mod h1:NHPJ89PdicEuT9hdPXMROBD91xc5uRDxsMtSB16k7hw="
...
> > +)
>
> There's a lot of duplication in here. For example, the only difference
> between the last two lines is one has /go.mod tacked onto the end of the
> version. Do we need both lines for each module?
If we intend on validating h1: then yes, because we need to do it for
the .zip and .mod.

> > +src_unpack() {
> > + unpack ${P}.tar.gz
> > + go-module_src_unpack
> > +}
> Can we do this some how in go-module_src_unpack so we don't have to
> boiler-plate it to every consumer?
See discussion in eclass thread.

Re: [PATCH v2 4/4] dev-vcs/cli: new package

On Wed, Feb 19, 2020 at 12:18:24AM -0600, William Hubbs wrote:
> > +RDEPEND+=">=dev-vcs/git-1.7.3"
> > +BDEPEND+=">=dev-lang/go-1.13"
> > +GOPATH="${WORKDIR}"
> You don't need += here
I've taken to += as a cleaner variant than xDEPEND="${xDEPEND} cat/pn"
when the eclass MAY be setting some dependencies already.

> or any value of GOPATH.
GOPATH is required in fact!

Without it being set, it inherits GOPATH from my scope when I run emerge
or ebuild, and since the portage user doesn't have permission to write
in that path, the Golang mod tooling fails during src_unpack for the
tidy/get calls. If those are skipped, then it fails during the build.
The vendor mode didn't have this issue, because it handles differently.

I do note that this is despite what the Golang docs say about
GO111MODULE envvar being set should cause GOPATH to be ignored.

The other two packages, go-tour and kube-bench explicitly set GOPATH in
their builds, so didn't run into this.

Re: [PATCH v2 1/4] eclass/go-module: add support for building based on go.sum

> On Tue, Feb 18, 2020 at 11:46:45PM -0600, William Hubbs wrote:
> > > -# If it does not have a vendor directory, you should use the EGO_VENDOR
> > > +# Alternatively, older versions of this eclass used the EGO_VENDOR
> > > # variable and the go-module_vendor_uris function as shown in the
> > > # example below to handle dependencies.
> > I think we can remove the example with EGO_VENDOR and
> > go-module_vendor_uris; we really don't want people to continue following
> > that example.
> I tried to handle more cases here, but now I'm wondering if it would be
> cleaner just to put all of new way into a distinct eclass, and sunset
> the old eclass entirely. I found unforeseen interactions, see below.
>
> > > +# S="${WORKDIR}/${MY_P}"
> > The default setting of S should be fine for most ebuilds, so I don't
> > think we need this in the example.
> I'd copied it, but yes in this case.
>
> >
> > > +# go-module_set_globals
> > > +#
> > > +# SRC_URI="https://github.com/example/${PN}/archive/v${PV}.tar.gz -> ${P}.tar.gz
> > > +# ${EGO_SUM_SRC_URI}"
> > > +#
> > > +# LICENSE="some-license ${EGO_SUM_LICENSES}"
> > > +#
> > > +# src_unpack() {
> > > +# unpack ${P}.tar.gz
> > > +# go-module_src_unpack
> > > +# }
> > I don't think I would put an src_unpack() in the example.
> This is one of the unforeseen interactions.
> The go.sum unpack only applies special handling to distfiles that it
> knows about. It does NOT process any other distfiles at all.
>
> EAPI8 or future Portage improvements might have annotations to disable
> any automatic unpacking for specific distfiles, which would resolve this
> issue.
>
> Hence, you need to explicitly unpack any distfiles that are NOT part of
> the go.sum dependencies. There are some ebuilds that do unpack & rename
> in src_unpack already, so they need extra care as well.
>
> The EGO_VENDOR src_unpack unpacked EVERYTHING, so it didn't have this
> issue.

I will look at that in a bit and comment on it.

>
> >
> > > +# The extra metadata keys accepted at this time are:
> > > +# - license: for dependencies built into the final runtime, the value field is
> > > +# a comma seperated list of Gentoo licenses to apply to the LICENSE variable,
> > > +#
> > There are two lines for each module in go.sum, the one with /go.mod as a
> > suffix to the version and the one without. We do not need both right?
> This is not entirely correct, and it's the warnings from golang upstream
> about some hidden complexity in the /go.mod that lead me to list both of
> them.
>
> If we intend to verify the h1: in future, then we need to list all
> /go.mod entries explicitly, so have somewhere to put the h1: hash.
> If you're verifying the h1: hash, you must verify it on the
> {version}.mod ALWAYS, and if the {version}.zip is present, then also on
> that file (otherwise it could sneak in some evil metadata via the
> {version}.mod).
>
> If we omit h1: entirely, then we can get away with listing ONE line in
> EGO_SUM for each dependency.
> - If it contains /go.mod, it will fetch ONLY the {version}.mod file.
> - If it does not contain /go.mod, it will fetch the {version}.mod file
> AND the {version}.zip file

If Go itself does that verification during the build, do we need to do
it also?

> > > +# @EXAMPLE:
> > > +# # github.com/BurntSushi/toml is a build-time only dep
> > > +# # github.com/aybabtme/rgbterm is a runtime dep, annotated with licenses
> >
> > I'm not sure we can distinguish between buildtime only and runtime deps.
> The 'golicense' tool will take a Golang binary and print out all of the
> Golang modules that got linked into it. I consider those to be the
> runtime deps in this case.
>
> > > +# @ECLASS-VARIABLE: _GOMODULE_GOPROXY_BASEURI
> ...
> > > +# This variable should NOT be present in user-level configuration e.g.
> > > +# /etc/portage/make.conf, as it will violate metadata immutability!
> > > +: "${_GOMODULE_GOPROXY_BASEURI:=mirror://goproxy/}"
> >
> > If this isn't supposed to be in user-level configuration, where should
> > it be set?
> Maybe I'll just prefix it with 'readonly' and force the value for now.
>
> > > # @FUNCTION: go-module_src_unpack
> > > # @DESCRIPTION:
> > > +# Extract & configure Go modules for consumpations.
> > > +# - Modules listed in EGO_SUM are configured as a local GOPROXY via symlinks (fast!)
> > > +# - Modules listed in EGO_VENDOR are extracted to "${S}/vendor" (slow)
> > > +#
> > > +# This function does NOT unpack the base distfile of a Go-based package.
> > > +# While the entries in EGO_SUM will be listed in ${A}, they should NOT be
> > > +# unpacked, Go will directly consume the files, including zips.
> > > +go-module_src_unpack() {
> >
> > If possible, this function should unpack the base distfile. That would
> > keep us from having to write src_unpack for every go ebuild that uses
> > the eclass.
> That's fine until we get to multiple base distfiles and handling them.
> Maybe pass a flag to go-module_src_unpack to tell it not to unpack any
> distfile that it does not explicitly know about?

Maybe, but again I'll think about this more.

> > > + die "Neither EGO_SUM nor EGO_VENDOR are set!"
> > This shouldn't die, having neither one set is valid.
> Yes, I caught this in later testing: a Golang package in the tree that
> inherit go-module, but didn't use EGO_VENDOR, EGO_SUM or have a vendor
> directory! This is another reason why I think bumping the eclass to a
> new name would be safer now.

Which ebuild was that? this is actually an invalid case for go-module.
A go module requires the presence of go.mod and optionally go.sum (if
there are no external deps) and optionally vendor/.

If you want an invalid case, you would have to be able to test for
something like:

Re: [PATCH v2 1/4] eclass/go-module: add support for building based on go.sum

I did find a way to apply your patch to the eclass today, so I'm working
with it locally now.

I would find it much more difficult to add license info to EGO_SUM than
to add it to LICENSE= directly. The lines in EGO_SUM are already pretty
long and adding info to them manually is more tedious than adding it to
LICENSE=.

Also, I do not see an advantage to adding license info
to EGO_SUM over adding it to LICENSE=. Either way yoou have to use
golicense or something similar to extract the info from the binary and manually
add it to the ebuild.