*Re: [PATCH 04/26] upload-pack: convert to a builtin
2018-01-03 20:39 ` Brandon Williams@ 2018-02-21 21:47 ` Jonathan Nieder
2018-02-21 23:35 ` Junio C Hamano0 siblings, 1 reply; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-21 21:47 UTC (permalink / raw)
To: Brandon Williams
Cc: Stefan Beller, git, Junio C Hamano, Jeff King, Philip Oakley,
Derrick Stolee, Sitaram Chamarty
Brandon Williams wrote:
> On 01/03, Stefan Beller wrote:
> > On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams <bmwill@google.com> wrote:
>>> In order to allow for code sharing with the server-side of fetch in
>>> protocol-v2 convert upload-pack to be a builtin.
>>
>> What is the security aspect of this patch?
>>
>> By making upload-pack builtin, it gains additional abilities,
>> such as answers to '-h' or '--help' (which would start a pager).
>> Is there an easy way to sooth my concerns? (best put into the
>> commit message)
>
> receive-pack is already a builtin, so theres that.
*nod*
Since v2.4.12~1^2 (shell: disallow repo names beginning with dash,
2017-04-29), git-shell refuses to pass --help to upload-pack, limiting
the security impact in configurations that use git-shell (e.g.
gitolite installations).
If you're not using git-shell, then hopefully you have some other form
of filtering preventing arbitrary options being passed to
git-upload-pack. If you don't, then you're in trouble, for the
reasons described in that commit.
Since some installations may be allowing access to git-upload-pack
(for read-only access) without allowing access to git-receive-pack,
this does increase the chance of attack. On the other hand, I suspect
the maintainability benefit is worth it.
For defense in depth, it would be comforting if the git wrapper had
some understanding of "don't support --help in handle_builtin when
invoked as a dashed command". That is, I don't expect that anyone has
been relying on
git-add --help
acting like
git help add
instead of printing the usage message from
git add -h
It's a little fussy because today we rewrite "git add --help" to
"git-add --help" before rewriting it to "git help add"; we'd have to
skip that middle hop for this to work.
I don't think that has to block this patch or series, though --- it's
just a separate thought about hardening.
Cc-ing Sitaram Chamarty since he tends to be wiser about this kind of
thing than I am.
What do you think?
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 04/26] upload-pack: convert to a builtin
2018-02-21 21:47 ` Jonathan Nieder@ 2018-02-21 23:35 ` Junio C Hamano0 siblings, 0 replies; 362+ messages in thread
From: Junio C Hamano @ 2018-02-21 23:35 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Stefan Beller, git, Jeff King, Philip Oakley,
Derrick Stolee, Sitaram Chamarty
Jonathan Nieder <jrnieder@gmail.com> writes:
> For defense in depth, it would be comforting if the git wrapper had
> some understanding of "don't support --help in handle_builtin when
> invoked as a dashed command". That is, I don't expect that anyone has
> been relying on
>
> git-add --help
>
> acting like
>
> git help add
>
> instead of printing the usage message from
>
> git add -h
Sounds like a neat trick.
> It's a little fussy because today we rewrite "git add --help" to
> "git-add --help" before rewriting it to "git help add"; we'd have to
> skip that middle hop for this to work.
I do not quite get this part. "git add --help" goes through run_argv()
and then to handle_builtin() which is what does this "git help add"
swapping.
"git-add --help" does get thrown into the same codepath by
pretending as if we got "add --help" as an argument to "git"
command, and that happens without going through run_argv(),
so presumably we can add another perameter to handle_builtin()
so that the callee can tell these two invocation sites apart, no?
> I don't think that has to block this patch or series, though --- it's
> just a separate thought about hardening.
Yeah, I agree with this assessment.
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 11/26] serve: introduce git-serve
2018-01-09 22:16 ` Brandon Williams@ 2018-01-09 22:28 ` Jonathan Tan
2018-01-09 22:34 ` Brandon Williams0 siblings, 1 reply; 362+ messages in thread
From: Jonathan Tan @ 2018-01-09 22:28 UTC (permalink / raw)
To: Brandon Williams
Cc: git, sbeller, gitster, peff, philipoakley, stolee, jrnieder
On Tue, 9 Jan 2018 14:16:42 -0800
Brandon Williams <bmwill@google.com> wrote:
> All good documentation changes.
Thanks!
> > > + /*
> > > + * Function called when a client requests the capability as a command.
> > > + * The command request will be provided to the function via 'keys', the
> > > + * capabilities requested, and 'args', the command specific parameters.
> > > + *
> > > + * This field should be NULL for capabilities which are not commands.
> > > + */
> > > + int (*command)(struct repository *r,
> > > + struct argv_array *keys,
> > > + struct argv_array *args);
> >
> > Looking at the code below, I see that the command is not executed unless
> > advertise returns true - this means that a command cannot be both
> > supported and unadvertised. Would this be too restrictive? For example,
> > this would disallow a gradual across-multiple-servers rollout in which
> > we allow but not advertise a capability, and then after some time,
> > advertise the capability.
>
> One way to change this would be to just add another function to the
> struct which is called to check if the command is allowed, instead of
> relying on the same function to do that for both advertise and
> allow...though I don't see a big win for allowing a command but not
> advertising it.
My rationale for allowing a command but not advertising it is in the
paragraph above (that you quoted), but if that is insufficient
rationale, then I agree that we don't need to do this.
> > If we change this, then the value parameter of advertise can be
> > mandatory instead of optional.
>
> I don't see how this fixes the issue you bring up.
This is a consequence, not a fix - if we were to do as I suggested, then
we no longer need to invoke advertise to check whether something is
advertised except when we are advertising them, in which case "value"
never needs to be NULL.
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 11/26] serve: introduce git-serve
2018-01-09 22:28 ` Jonathan Tan@ 2018-01-09 22:34 ` Brandon Williams0 siblings, 0 replies; 362+ messages in thread
From: Brandon Williams @ 2018-01-09 22:34 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, sbeller, gitster, peff, philipoakley, stolee, jrnieder
On 01/09, Jonathan Tan wrote:
> On Tue, 9 Jan 2018 14:16:42 -0800
> Brandon Williams <bmwill@google.com> wrote:
>
> > All good documentation changes.
>
> Thanks!
>
> > > > + /*
> > > > + * Function called when a client requests the capability as a command.
> > > > + * The command request will be provided to the function via 'keys', the
> > > > + * capabilities requested, and 'args', the command specific parameters.
> > > > + *
> > > > + * This field should be NULL for capabilities which are not commands.
> > > > + */
> > > > + int (*command)(struct repository *r,
> > > > + struct argv_array *keys,
> > > > + struct argv_array *args);
> > >
> > > Looking at the code below, I see that the command is not executed unless
> > > advertise returns true - this means that a command cannot be both
> > > supported and unadvertised. Would this be too restrictive? For example,
> > > this would disallow a gradual across-multiple-servers rollout in which
> > > we allow but not advertise a capability, and then after some time,
> > > advertise the capability.
> >
> > One way to change this would be to just add another function to the
> > struct which is called to check if the command is allowed, instead of
> > relying on the same function to do that for both advertise and
> > allow...though I don't see a big win for allowing a command but not
> > advertising it.
>
> My rationale for allowing a command but not advertising it is in the
> paragraph above (that you quoted), but if that is insufficient
> rationale, then I agree that we don't need to do this.
I have no issues with adding that functionality, i don't really feel
that strongly one way or another. Just seemed like additional work for
not much gain right now, key being right now. It very well may be worth
it for the use case you specified. If so I can definitely make the
change.
>
> > > If we change this, then the value parameter of advertise can be
> > > mandatory instead of optional.
> >
> > I don't see how this fixes the issue you bring up.
>
> This is a consequence, not a fix - if we were to do as I suggested, then
> we no longer need to invoke advertise to check whether something is
> advertised except when we are advertising them, in which case "value"
> never needs to be NULL.
Oh I understand what you are trying to explain, yes you're right.
--
Brandon Williams
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 11/26] serve: introduce git-serve
2018-01-03 0:18 ` [PATCH 11/26] serve: introduce git-serve Brandon Williams
2018-01-09 20:24 ` Jonathan Tan@ 2018-02-01 18:48 ` Jeff Hostetler
2018-02-01 18:57 ` Stefan Beller1 sibling, 1 reply; 362+ messages in thread
From: Jeff Hostetler @ 2018-02-01 18:48 UTC (permalink / raw)
To: Brandon Williams, git
Cc: sbeller, gitster, peff, philipoakley, stolee, jrnieder
On 1/2/2018 7:18 PM, Brandon Williams wrote:
> Introduce git-serve, the base server for protocol version 2.
>
> Protocol version 2 is intended to be a replacement for Git's current
> wire protocol. The intention is that it will be a simpler, less
> wasteful protocol which can evolve over time.
>
> Protocol version 2 improves upon version 1 by eliminating the initial
> ref advertisement. In its place a server will export a list of
> capabilities and commands which it supports in a capability
> advertisement. A client can then request that a particular command be
> executed by providing a number of capabilities and command specific
> parameters. At the completion of a command, a client can request that
> another command be executed or can terminate the connection by sending a
> flush packet.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
> .gitignore | 1 +
> Documentation/technical/protocol-v2.txt | 91 ++++++++++++
> Makefile | 2 +
> builtin.h | 1 +
> builtin/serve.c | 30 ++++
> git.c | 1 +
> serve.c | 239 ++++++++++++++++++++++++++++++++
> serve.h | 15 ++
> 8 files changed, 380 insertions(+)
> create mode 100644 Documentation/technical/protocol-v2.txt
> create mode 100644 builtin/serve.c
> create mode 100644 serve.c
> create mode 100644 serve.h
>
> diff --git a/.gitignore b/.gitignore
> index 833ef3b0b..2d0450c26 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -140,6 +140,7 @@
> /git-rm
> /git-send-email
> /git-send-pack
> +/git-serve
> /git-sh-i18n
> /git-sh-i18n--envsubst
> /git-sh-setup
> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> new file mode 100644
> index 000000000..b87ba3816
> --- /dev/null
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -0,0 +1,91 @@
> + Git Wire Protocol, Version 2
> +==============================
> +
> +This document presents a specification for a version 2 of Git's wire
> +protocol. Protocol v2 will improve upon v1 in the following ways:
> +
> + * Instead of multiple service names, multiple commands will be
> + supported by a single service.
> + * Easily extendable as capabilities are moved into their own section
> + of the protocol, no longer being hidden behind a NUL byte and
> + limited by the size of a pkt-line (as there will be a single
> + capability per pkt-line).
> + * Separate out other information hidden behind NUL bytes (e.g. agent
> + string as a capability and symrefs can be requested using 'ls-refs')
> + * Reference advertisement will be omitted unless explicitly requested
> + * ls-refs command to explicitly request some refs
> +
> + Detailed Design
> +=================
> +
> +A client can request to speak protocol v2 by sending `version=2` in the
> +side-channel `GIT_PROTOCOL` in the initial request to the server.
> +
> +In protocol v2 communication is command oriented. When first contacting a
> +server a list of capabilities will advertised. Some of these capabilities
> +will be commands which a client can request be executed. Once a command
> +has completed, a client can reuse the connection and request that other
> +commands be executed.
> +
> + Special Packets
> +-----------------
> +
> +In protocol v2 these special packets will have the following semantics:
> +
> + * '0000' Flush Packet (flush-pkt) - indicates the end of a message
> + * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
Previously, a 0001 pkt-line meant that there was 1 byte of data
following, right? Does this change that and/or prevent 1 byte
packets? (Not sure if it is likely, but the odd-tail of a packfile
might get sent in a 0001 line, right?) Or is it that 0001 is only
special during the V2 negotiation stuff, but not during the packfile
transmission?
(I'm not against having this delimiter -- I think it is useful, but
just curious if will cause problems elsewhere.)
Should we also consider increasing the pkt-line limit to 5 hex-digits
while we're at it ? That would let us have 1MB buffers if that would
help with large packfiles. Granted, we're throttled by the network,
so it might not matter. Would it be interesting to have a 5 digit
prefix with parts of the high bits of first digit being flags ?
Or is this too radical of a change?
> +
> + Capability Advertisement
> +--------------------------
> +
> +A server which decides to communicate (based on a request from a client)
> +using protocol version 2, notifies the client by sending a version string
> +in its initial response followed by an advertisement of its capabilities.
> +Each capability is a key with an optional value. Clients must ignore all
> +unknown keys. Semantics of unknown values are left to the definition of
> +each key. Some capabilities will describe commands which can be requested
> +to be executed by the client.
> +
> + capability-advertisement = protocol-version
> + capability-list
> + flush-pkt
> +
> + protocol-version = PKT-LINE("version 2" LF)
> + capability-list = *capability
> + capability = PKT-LINE(key[=value] LF)
> +
> + key = 1*CHAR
> + value = 1*CHAR
> + CHAR = 1*(ALPHA / DIGIT / "-" / "_")
> +
> +A client then responds to select the command it wants with any particular
> +capabilities or arguments. There is then an optional section where the
> +client can provide any command specific parameters or queries.
> +
> + command-request = command
> + capability-list
> + (command-args)
> + flush-pkt
> + command = PKT-LINE("command=" key LF)
> + command-args = delim-pkt
> + *arg
> + arg = 1*CHAR
> +
> +The server will then check to ensure that the client's request is
> +comprised of a valid command as well as valid capabilities which were
> +advertised. If the request is valid the server will then execute the
> +command.
> +
> +A particular command can last for as many rounds as are required to
> +complete the service (multiple for negotiation during fetch or no
> +additional trips in the case of ls-refs).
> +
> +When finished a client should send an empty request of just a flush-pkt to
> +terminate the connection.
> +
> + Commands in v2
> +~~~~~~~~~~~~~~~~
> +
> +Commands are the core actions that a client wants to perform (fetch, push,
> +etc). Each command will be provided with a list capabilities and
> +arguments as requested by a client.
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 11/26] serve: introduce git-serve
2018-02-01 18:48 ` Jeff Hostetler@ 2018-02-01 18:57 ` Stefan Beller
2018-02-01 19:09 ` Jeff Hostetler
2018-02-01 19:45 ` Randall S. Becker0 siblings, 2 replies; 362+ messages in thread
From: Stefan Beller @ 2018-02-01 18:57 UTC (permalink / raw)
To: Jeff Hostetler
Cc: Brandon Williams, git, Junio C Hamano, Jeff King, Philip Oakley,
Derrick Stolee, Jonathan Nieder
On Thu, Feb 1, 2018 at 10:48 AM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>
> On 1/2/2018 7:18 PM, Brandon Williams wrote:
>>
>> Introduce git-serve, the base server for protocol version 2.
>>
>> Protocol version 2 is intended to be a replacement for Git's current
>> wire protocol. The intention is that it will be a simpler, less
>> wasteful protocol which can evolve over time.
>>
>> Protocol version 2 improves upon version 1 by eliminating the initial
>> ref advertisement. In its place a server will export a list of
>> capabilities and commands which it supports in a capability
>> advertisement. A client can then request that a particular command be
>> executed by providing a number of capabilities and command specific
>> parameters. At the completion of a command, a client can request that
>> another command be executed or can terminate the connection by sending a
>> flush packet.
>>
>> Signed-off-by: Brandon Williams <bmwill@google.com>
>> ---
>> .gitignore | 1 +
>> Documentation/technical/protocol-v2.txt | 91 ++++++++++++
>> Makefile | 2 +
>> builtin.h | 1 +
>> builtin/serve.c | 30 ++++
>> git.c | 1 +
>> serve.c | 239
>> ++++++++++++++++++++++++++++++++
>> serve.h | 15 ++
>> 8 files changed, 380 insertions(+)
>> create mode 100644 Documentation/technical/protocol-v2.txt
>> create mode 100644 builtin/serve.c
>> create mode 100644 serve.c
>> create mode 100644 serve.h
>>
>> diff --git a/.gitignore b/.gitignore
>> index 833ef3b0b..2d0450c26 100644
>> --- a/.gitignore
>> +++ b/.gitignore
>> @@ -140,6 +140,7 @@
>> /git-rm
>> /git-send-email
>> /git-send-pack
>> +/git-serve
>> /git-sh-i18n
>> /git-sh-i18n--envsubst
>> /git-sh-setup
>> diff --git a/Documentation/technical/protocol-v2.txt
>> b/Documentation/technical/protocol-v2.txt
>> new file mode 100644
>> index 000000000..b87ba3816
>> --- /dev/null
>> +++ b/Documentation/technical/protocol-v2.txt
>> @@ -0,0 +1,91 @@
>> + Git Wire Protocol, Version 2
>> +==============================
>> +
>> +This document presents a specification for a version 2 of Git's wire
>> +protocol. Protocol v2 will improve upon v1 in the following ways:
>> +
>> + * Instead of multiple service names, multiple commands will be
>> + supported by a single service.
>> + * Easily extendable as capabilities are moved into their own section
>> + of the protocol, no longer being hidden behind a NUL byte and
>> + limited by the size of a pkt-line (as there will be a single
>> + capability per pkt-line).
>> + * Separate out other information hidden behind NUL bytes (e.g. agent
>> + string as a capability and symrefs can be requested using 'ls-refs')
>> + * Reference advertisement will be omitted unless explicitly requested
>> + * ls-refs command to explicitly request some refs
>> +
>> + Detailed Design
>> +=================
>> +
>> +A client can request to speak protocol v2 by sending `version=2` in the
>> +side-channel `GIT_PROTOCOL` in the initial request to the server.
>> +
>> +In protocol v2 communication is command oriented. When first contacting
>> a
>> +server a list of capabilities will advertised. Some of these
>> capabilities
>> +will be commands which a client can request be executed. Once a command
>> +has completed, a client can reuse the connection and request that other
>> +commands be executed.
>> +
>> + Special Packets
>> +-----------------
>> +
>> +In protocol v2 these special packets will have the following semantics:
>> +
>> + * '0000' Flush Packet (flush-pkt) - indicates the end of a message
>> + * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
>
>
> Previously, a 0001 pkt-line meant that there was 1 byte of data
> following, right?
No, the length was including the length field, so 0005 would indicate that
there is one byte following, (+4 bytes of "0005" included)
> Does this change that and/or prevent 1 byte
> packets? (Not sure if it is likely, but the odd-tail of a packfile
> might get sent in a 0001 line, right?) Or is it that 0001 is only
> special during the V2 negotiation stuff, but not during the packfile
> transmission?
0001 is invalid in the current protocol v0.
>
> (I'm not against having this delimiter -- I think it is useful, but
> just curious if will cause problems elsewhere.)
>
> Should we also consider increasing the pkt-line limit to 5 hex-digits
> while we're at it ? That would let us have 1MB buffers if that would
> help with large packfiles.
AFAICT there is a static allocation of one pkt-line (of maximum size),
such that the code can read in a full packet and then process it.
If we'd increase the packet size we'd need the static buffer to be 1MB,
which sounds good for my developer machine. But I suspect it may be
too much for people using git on embedded devices?
pack files larger than 64k are put into multiple pkt-lines, which is
not a big deal, as the overhead of 4bytes per 64k is negligible.
(also there is progress information in the side channel, which
would come in as a special packet in between real packets,
such that every 64k transmitted you can update your progress
meter; Not sure I feel strongly on fewer progress updates)
> Granted, we're throttled by the network,
> so it might not matter. Would it be interesting to have a 5 digit
> prefix with parts of the high bits of first digit being flags ?
> Or is this too radical of a change?
What would the flags be for?
As an alternative we could put the channel number in one byte,
such that we can have a side channel not just while streaming the
pack but all the time. (Again, not sure if that buys a lot for us)
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 11/26] serve: introduce git-serve
2018-02-01 18:57 ` Stefan Beller@ 2018-02-01 19:09 ` Jeff Hostetler
2018-02-01 20:05 ` Brandon Williams
2018-02-01 19:45 ` Randall S. Becker1 sibling, 1 reply; 362+ messages in thread
From: Jeff Hostetler @ 2018-02-01 19:09 UTC (permalink / raw)
To: Stefan Beller
Cc: Brandon Williams, git, Junio C Hamano, Jeff King, Philip Oakley,
Derrick Stolee, Jonathan Nieder
On 2/1/2018 1:57 PM, Stefan Beller wrote:
> On Thu, Feb 1, 2018 at 10:48 AM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>>
>> On 1/2/2018 7:18 PM, Brandon Williams wrote:
>>>
>>> Introduce git-serve, the base server for protocol version 2.
[...]
>>> + Special Packets
>>> +-----------------
>>> +
>>> +In protocol v2 these special packets will have the following semantics:
>>> +
>>> + * '0000' Flush Packet (flush-pkt) - indicates the end of a message
>>> + * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
>>
>>
>> Previously, a 0001 pkt-line meant that there was 1 byte of data
>> following, right?
>
> No, the length was including the length field, so 0005 would indicate that
> there is one byte following, (+4 bytes of "0005" included)
d'oh. right. thanks!
>> Should we also consider increasing the pkt-line limit to 5 hex-digits
>> while we're at it ? That would let us have 1MB buffers if that would
>> help with large packfiles.
>
> AFAICT there is a static allocation of one pkt-line (of maximum size),
> such that the code can read in a full packet and then process it.
> If we'd increase the packet size we'd need the static buffer to be 1MB,
> which sounds good for my developer machine. But I suspect it may be
> too much for people using git on embedded devices?
I got burned by that static buffer once upon a time when I wanted
to have 2 streams going at the same time. Hopefully, we can move
that into the new reader structure at some point (if it isn't already).
>
> pack files larger than 64k are put into multiple pkt-lines, which is
> not a big deal, as the overhead of 4bytes per 64k is negligible.
> (also there is progress information in the side channel, which
> would come in as a special packet in between real packets,
> such that every 64k transmitted you can update your progress
> meter; Not sure I feel strongly on fewer progress updates)
>
>> Granted, we're throttled by the network,
>> so it might not matter. Would it be interesting to have a 5 digit
>> prefix with parts of the high bits of first digit being flags ?
>> Or is this too radical of a change?
>
> What would the flags be for?
>
> As an alternative we could put the channel number in one byte,
> such that we can have a side channel not just while streaming the
> pack but all the time. (Again, not sure if that buys a lot for us)
>
Delimiters like the 0001 and the side channel are a couple of
ideas, but I was just thinking out loud. And right, I'm not sure
it gets us much right now.
Jeff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 11/26] serve: introduce git-serve
2018-02-01 19:09 ` Jeff Hostetler@ 2018-02-01 20:05 ` Brandon Williams0 siblings, 0 replies; 362+ messages in thread
From: Brandon Williams @ 2018-02-01 20:05 UTC (permalink / raw)
To: Jeff Hostetler
Cc: Stefan Beller, git, Junio C Hamano, Jeff King, Philip Oakley,
Derrick Stolee, Jonathan Nieder
On 02/01, Jeff Hostetler wrote:
>
>
> On 2/1/2018 1:57 PM, Stefan Beller wrote:
> > On Thu, Feb 1, 2018 at 10:48 AM, Jeff Hostetler <git@jeffhostetler.com> wrote:
> > >
> > >
> > > On 1/2/2018 7:18 PM, Brandon Williams wrote:
> > > >
> > > > Introduce git-serve, the base server for protocol version 2.
> [...]
> > > > + Special Packets
> > > > +-----------------
> > > > +
> > > > +In protocol v2 these special packets will have the following semantics:
> > > > +
> > > > + * '0000' Flush Packet (flush-pkt) - indicates the end of a message
> > > > + * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
> > >
> > >
> > > Previously, a 0001 pkt-line meant that there was 1 byte of data
> > > following, right?
> >
> > No, the length was including the length field, so 0005 would indicate that
> > there is one byte following, (+4 bytes of "0005" included)
>
> d'oh. right. thanks!
>
> > > Should we also consider increasing the pkt-line limit to 5 hex-digits
> > > while we're at it ? That would let us have 1MB buffers if that would
> > > help with large packfiles.
> >
> > AFAICT there is a static allocation of one pkt-line (of maximum size),
> > such that the code can read in a full packet and then process it.
> > If we'd increase the packet size we'd need the static buffer to be 1MB,
> > which sounds good for my developer machine. But I suspect it may be
> > too much for people using git on embedded devices?
>
> I got burned by that static buffer once upon a time when I wanted
> to have 2 streams going at the same time. Hopefully, we can move
> that into the new reader structure at some point (if it isn't already).
Yeah the reader struct could easily be extended to take in the
buffer to read the data into. Because I'm not trying to do any of that
atm I decided to have it default to using the static buffer, but it
would be as simple as changing the reader->buffer variable to use a
different buffer.
>
> >
> > pack files larger than 64k are put into multiple pkt-lines, which is
> > not a big deal, as the overhead of 4bytes per 64k is negligible.
> > (also there is progress information in the side channel, which
> > would come in as a special packet in between real packets,
> > such that every 64k transmitted you can update your progress
> > meter; Not sure I feel strongly on fewer progress updates)
> >
> > > Granted, we're throttled by the network,
> > > so it might not matter. Would it be interesting to have a 5 digit
> > > prefix with parts of the high bits of first digit being flags ?
> > > Or is this too radical of a change?
> >
> > What would the flags be for?
> >
> > As an alternative we could put the channel number in one byte,
> > such that we can have a side channel not just while streaming the
> > pack but all the time. (Again, not sure if that buys a lot for us)
> >
>
> Delimiters like the 0001 and the side channel are a couple of
> ideas, but I was just thinking out loud. And right, I'm not sure
> it gets us much right now.
>
> Jeff
--
Brandon Williams
^permalinkrawreply [flat|nested] 362+ messages in thread

*RE: [PATCH 11/26] serve: introduce git-serve
2018-02-01 18:57 ` Stefan Beller
2018-02-01 19:09 ` Jeff Hostetler@ 2018-02-01 19:45 ` Randall S. Becker
2018-02-01 20:08 ` Brandon Williams1 sibling, 1 reply; 362+ messages in thread
From: Randall S. Becker @ 2018-02-01 19:45 UTC (permalink / raw)
To: Stefan Beller, Jeff Hostetler
Cc: Brandon Williams, git, Junio C Hamano, Jeff King, Philip Oakley,
Derrick Stolee, Jonathan Nieder
On February 1, 2018 1:58 PM, Stefan Beller wrote:
> On Thu, Feb 1, 2018 at 10:48 AM, Jeff Hostetler <git@jeffhostetler.com>
> wrote:
> >
> >
> > On 1/2/2018 7:18 PM, Brandon Williams wrote:
> >>
> >> Introduce git-serve, the base server for protocol version 2.
> >>
> >> Protocol version 2 is intended to be a replacement for Git's current
> >> wire protocol. The intention is that it will be a simpler, less
> >> wasteful protocol which can evolve over time.
> >>
> >> Protocol version 2 improves upon version 1 by eliminating the initial
> >> ref advertisement. In its place a server will export a list of
> >> capabilities and commands which it supports in a capability
> >> advertisement. A client can then request that a particular command
> >> be executed by providing a number of capabilities and command
> >> specific parameters. At the completion of a command, a client can
> >> request that another command be executed or can terminate the
> >> connection by sending a flush packet.
> >>
> >> Signed-off-by: Brandon Williams <bmwill@google.com>
> >> ---
> >> .gitignore | 1 +
> >> Documentation/technical/protocol-v2.txt | 91 ++++++++++++
> >> Makefile | 2 +
> >> builtin.h | 1 +
> >> builtin/serve.c | 30 ++++
> >> git.c | 1 +
> >> serve.c | 239
> >> ++++++++++++++++++++++++++++++++
> >> serve.h | 15 ++
> >> 8 files changed, 380 insertions(+)
> >> create mode 100644 Documentation/technical/protocol-v2.txt
> >> create mode 100644 builtin/serve.c
> >> create mode 100644 serve.c
> >> create mode 100644 serve.h
> >>
> >> diff --git a/.gitignore b/.gitignore
> >> index 833ef3b0b..2d0450c26 100644
> >> --- a/.gitignore
> >> +++ b/.gitignore
> >> @@ -140,6 +140,7 @@
> >> /git-rm
> >> /git-send-email
> >> /git-send-pack
> >> +/git-serve
> >> /git-sh-i18n
> >> /git-sh-i18n--envsubst
> >> /git-sh-setup
> >> diff --git a/Documentation/technical/protocol-v2.txt
> >> b/Documentation/technical/protocol-v2.txt
> >> new file mode 100644
> >> index 000000000..b87ba3816
> >> --- /dev/null
> >> +++ b/Documentation/technical/protocol-v2.txt
> >> @@ -0,0 +1,91 @@
> >> + Git Wire Protocol, Version 2
> >> +==============================
> >> +
> >> +This document presents a specification for a version 2 of Git's wire
> >> +protocol. Protocol v2 will improve upon v1 in the following ways:
> >> +
> >> + * Instead of multiple service names, multiple commands will be
> >> + supported by a single service.
> >> + * Easily extendable as capabilities are moved into their own section
> >> + of the protocol, no longer being hidden behind a NUL byte and
> >> + limited by the size of a pkt-line (as there will be a single
> >> + capability per pkt-line).
> >> + * Separate out other information hidden behind NUL bytes (e.g. agent
> >> + string as a capability and symrefs can be requested using
> >> + 'ls-refs')
> >> + * Reference advertisement will be omitted unless explicitly
> >> + requested
> >> + * ls-refs command to explicitly request some refs
> >> +
> >> + Detailed Design
> >> +=================
> >> +
> >> +A client can request to speak protocol v2 by sending `version=2` in
> >> +the side-channel `GIT_PROTOCOL` in the initial request to the server.
> >> +
> >> +In protocol v2 communication is command oriented. When first
> >> +contacting
> >> a
> >> +server a list of capabilities will advertised. Some of these
> >> capabilities
> >> +will be commands which a client can request be executed. Once a
> >> +command has completed, a client can reuse the connection and request
> >> +that other commands be executed.
> >> +
> >> + Special Packets
> >> +-----------------
> >> +
> >> +In protocol v2 these special packets will have the following semantics:
> >> +
> >> + * '0000' Flush Packet (flush-pkt) - indicates the end of a message
> >> + * '0001' Delimiter Packet (delim-pkt) - separates sections of a
> >> + message
> >
> >
> > Previously, a 0001 pkt-line meant that there was 1 byte of data
> > following, right?
>
> No, the length was including the length field, so 0005 would indicate that
> there is one byte following, (+4 bytes of "0005" included)
>
> > Does this change that and/or prevent 1 byte packets? (Not sure if it
> > is likely, but the odd-tail of a packfile might get sent in a 0001
> > line, right?) Or is it that 0001 is only special during the V2
> > negotiation stuff, but not during the packfile transmission?
>
> 0001 is invalid in the current protocol v0.
>
> >
> > (I'm not against having this delimiter -- I think it is useful, but
> > just curious if will cause problems elsewhere.)
> >
> > Should we also consider increasing the pkt-line limit to 5 hex-digits
> > while we're at it ? That would let us have 1MB buffers if that would
> > help with large packfiles.
>
> AFAICT there is a static allocation of one pkt-line (of maximum size), such
> that the code can read in a full packet and then process it.
> If we'd increase the packet size we'd need the static buffer to be 1MB, which
> sounds good for my developer machine. But I suspect it may be too much for
> people using git on embedded devices?
>
> pack files larger than 64k are put into multiple pkt-lines, which is not a big
> deal, as the overhead of 4bytes per 64k is negligible.
> (also there is progress information in the side channel, which would come in
> as a special packet in between real packets, such that every 64k transmitted
> you can update your progress meter; Not sure I feel strongly on fewer
> progress updates)
Can I request, selfishly from my own platform's (NonStop) performance heartache, that we don't require 1Mb? We're not embedded on this platform, but there is an optimized message system packet size down at 50Kb that I would like to stay under. Although above that is no problem, there is a significant cost incurred above that size point. And please make sure xread/xwrite are used in any event.
> > Granted, we're throttled by the network, so it might not matter.
> > Would it be interesting to have a 5 digit prefix with parts of the
> > high bits of first digit being flags ?
> > Or is this too radical of a change?
>
> What would the flags be for?
>
> As an alternative we could put the channel number in one byte, such that we
> can have a side channel not just while streaming the pack but all the time.
> (Again, not sure if that buys a lot for us)
Cheers,
Randall
-- Brief whoami:
NonStop developer since approximately 211288444200000000
UNIX developer since approximately 421664400
-- In my real life, I talk too much.
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 11/26] serve: introduce git-serve
2018-02-01 20:37 ` Randall S. Becker@ 2018-02-01 20:50 ` Stefan Beller0 siblings, 0 replies; 362+ messages in thread
From: Stefan Beller @ 2018-02-01 20:50 UTC (permalink / raw)
To: Randall S. Becker
Cc: Brandon Williams, Jeff Hostetler, git, Junio C Hamano, Jeff King,
Philip Oakley, Derrick Stolee, Jonathan Nieder
On Thu, Feb 1, 2018 at 12:37 PM, Randall S. Becker
<rsbecker@nexbridge.com> wrote:
>> I think that it would be too much of a change to up to 1MB lines at the
>> moment so I'm planning on leaving it right where it is :)
>
> In for a kilo, in for a tonne. Once we're way up there, it's not a problem
> or much of a difference. :)
What benefit does a larger buffer have?
I outlined the negatives above (large static buffer, issues with
progress meter).
And it seems to me that Brandon wants to keep this series as small as possible
w.r.t. bait for endless discussions and only deliver innovation, that solves the
immediate needs. Are there issues with too small buffers? (Can you link to
the performance measurements or an analysis?)
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 20/26] fetch-pack: perform a fetch using v2
2018-01-03 0:18 ` [PATCH 20/26] fetch-pack: perform a fetch using v2 Brandon Williams
2018-01-04 1:23 ` Stefan Beller@ 2018-01-10 0:05 ` Jonathan Tan1 sibling, 0 replies; 362+ messages in thread
From: Jonathan Tan @ 2018-01-10 0:05 UTC (permalink / raw)
To: Brandon Williams
Cc: git, sbeller, gitster, peff, philipoakley, stolee, jrnieder
On Tue, 2 Jan 2018 16:18:22 -0800
Brandon Williams <bmwill@google.com> wrote:
> +static enum ack_type process_ack(const char *line, struct object_id *oid)
> +{
> + const char *arg;
> +
> + if (!strcmp(line, "NAK"))
> + return NAK;
> + if (skip_prefix(line, "ACK ", &arg)) {
> + if (!parse_oid_hex(arg, oid, &arg)) {
> + if (strstr(arg, "continue"))
> + return ACK_continue;
This function seems to be only used for v2, so I don't think we need to
parse "continue".
Also, maybe describe the plan for supporting functionality not supported
yet (e.g. server-side declaration of shallows and client-side "deepen").
It may be possible to delay support for server-side shallows on the
server (that is, only implement support for it in the client) since the
server can just declare that it doesn't support protocol v2 when serving
such repos (although it might just be easier to implement server-side
support in this case).
For "deepen", we need support for it both on the client and the server
now unless we plan to declare a "deepen" capability in the future (then,
as of these patches, clients that require "deepen" will use protocol v1;
when a new server declares "deepen", old clients will ignore it and keep
the status quo, and new clients can then use "deepen").
There may be others that I've missed.
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 00/26] protocol version 2
2018-01-03 0:18 [PATCH 00/26] protocol version 2 Brandon Williams
` (25 preceding siblings ...)
2018-01-03 0:18 ` [PATCH 26/26] remote-curl: implement connect-half-duplex command Brandon Williams
@ 2018-01-09 17:55 ` Jonathan Tan
2018-01-11 0:23 ` Brandon Williams
2018-01-25 23:58 ` [PATCH v2 00/27] " Brandon Williams
27 siblings, 1 reply; 362+ messages in thread
From: Jonathan Tan @ 2018-01-09 17:55 UTC (permalink / raw)
To: Brandon Williams
Cc: git, sbeller, gitster, peff, philipoakley, stolee, jrnieder
On Tue, 2 Jan 2018 16:18:02 -0800
Brandon Williams <bmwill@google.com> wrote:
> The following patches extend what I sent out as an WIP
> (https://public-inbox.org/git/20171204235823.63299-1-bmwill@google.com/) and
> implement protocol version 2.
Summarizing (for myself) the rationale for protocol version 2:
The existing protocol has a few pain points: (a) limit on the length of
the capability line (the capability line can be used to include
additional parameters in a backwards-compatible way), (b) difficulty in
creating proxies because of inconsistent flush semantics, and (c) the
need to implement clients twice - once for HTTP and once for
connect-supporting transports. To which we can add another: (d) if we
want to support something entirely new (for example, a server-side "git
log"), we will need a new protocol anyway.
The new functionality introduced in this patch set is probably best done
using a new protocol. If it were done using the existing protocol (by
adding a parameter in the capabilities line), we would still run into
(a) and (c), so we might as well introduce the new protocol now.
Some of the above points are repeats from my previous e-mail:
https://public-inbox.org/git/20171110121347.1f7c184c543622b60164e9fb@google.com/> Some changes from that series are as follows:
> * Lots of various cleanup on the ls-refs and fetch command code, both server
> and client.
> * Fetch command now supports a stateless-rpc mode which enables communicating
> with a half-duplex connection.
Good to hear about fetch support.
> * Introduce a new remote-helper command 'connect-half-duplex' which is
> implemented by remote-curl (the http remote-helper). This allows for a
> client to establish a half-duplex connection and use remote-curl as a proxy
> to wrap requests in http before sending them to the remote end and
> unwrapping the responses and sending them back to the client's stdin.
I'm not sure about the "half-duplex" name - it is half-duplex in that
each side must terminate their communications with a flush, but not
half-duplex in that request-response pairs can overlap each other (e.g.
during negotation during fetch - there is an optimization in which the
client tries to keep two requests pending at a time). I think that the
idea we want to communicate is that requests and responses are always
packetized, stateless, and always happen as a pair.
I wonder if "stateless-connect" is a better keyword - it makes sense to
me (once described) that "stateless" implies that the client sends
everything the server needs at once (thus, in a packet), the server
sends everything the client needs back at once (thus, in a packet), and
then the client must not assume any state-storing on the part of the
server or transport.
> * The transport code is refactored for ls-remote, fetch, and push to provide a
> list of ref-patterns (based on the refspec being used) when requesting refs
> from the remote end. This allows the ls-refs code to send this list of
> patterns so the remote end and filter the refs it sends back.
Briefly looking at the implementation, the client seems to incur an
extra roundtrip when using ls-remote (and others) with a v2-supporting
server. I initially didn't like this, but upon further reflection, this
is probably fine for now. The client can be upgraded later, and I think
that clients will eventually want to query git-serve directly for
"ls-refs" first, and then fall back to v0 for ancient servers, instead
of checking git-upload-pack first (as in this patch set) - so, the
support for "ls-refs" here won't be carried forward merely for backwards
compatibility, but will eventually be actively used.
As for the decision to use a new endpoint "git-serve" instead of reusing
"git-upload-pack" (or "git-receive-pack"), reusing the existing one
might allow some sort of optimization later in which the first
"git-upload-pack" query immediately returns with the v2 answer (instead
of redirecting the client to "git-serve"), but this probably doesn't
matter in practice (as I stated above, I think that eventually clients
will query git-serve first).
> This series effectively implements protocol version 2 for listing a remotes
> refs (ls-remote) as well as for fetch for the builtin transports (ssh, git,
> file) and for the http/https transports. Push is not implemented yet and
> doesn't need to be implemented at the same time as fetch since the
> receive-pack code can default to using protocol v0 when v2 is requested by the
> client.
Agreed - push can be done later.
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH 00/26] protocol version 2
2018-01-09 17:55 ` [PATCH 00/26] protocol version 2 Jonathan Tan
@ 2018-01-11 0:23 ` Brandon Williams0 siblings, 0 replies; 362+ messages in thread
From: Brandon Williams @ 2018-01-11 0:23 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, sbeller, gitster, peff, philipoakley, stolee, jrnieder
On 01/09, Jonathan Tan wrote:
> On Tue, 2 Jan 2018 16:18:02 -0800
> Brandon Williams <bmwill@google.com> wrote:
>
> > * Introduce a new remote-helper command 'connect-half-duplex' which is
> > implemented by remote-curl (the http remote-helper). This allows for a
> > client to establish a half-duplex connection and use remote-curl as a proxy
> > to wrap requests in http before sending them to the remote end and
> > unwrapping the responses and sending them back to the client's stdin.
>
> I'm not sure about the "half-duplex" name - it is half-duplex in that
> each side must terminate their communications with a flush, but not
> half-duplex in that request-response pairs can overlap each other (e.g.
> during negotation during fetch - there is an optimization in which the
> client tries to keep two requests pending at a time). I think that the
> idea we want to communicate is that requests and responses are always
> packetized, stateless, and always happen as a pair.
>
> I wonder if "stateless-connect" is a better keyword - it makes sense to
> me (once described) that "stateless" implies that the client sends
> everything the server needs at once (thus, in a packet), the server
> sends everything the client needs back at once (thus, in a packet), and
> then the client must not assume any state-storing on the part of the
> server or transport.
I like that name much better, I think I'll change it to use
'stateless-connect'. Thanks :)
--
Brandon Williams
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v2 12/27] serve: introduce git-serve
2018-01-25 23:58 ` [PATCH v2 12/27] serve: introduce git-serve Brandon Williams
@ 2018-01-26 10:39 ` Duy Nguyen
2018-02-27 5:46 ` Jonathan Nieder
2018-01-31 15:39 ` Derrick Stolee1 sibling, 1 reply; 362+ messages in thread
From: Duy Nguyen @ 2018-01-26 10:39 UTC (permalink / raw)
To: Brandon Williams
Cc: Git Mailing List, Stefan Beller, Junio C Hamano, Jeff King,
Philip Oakley, stolee, Jonathan Nieder
On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams <bmwill@google.com> wrote:
> + Detailed Design
> +=================
> +
> +A client can request to speak protocol v2 by sending `version=2` in the
> +side-channel `GIT_PROTOCOL` in the initial request to the server.
> +
> +In protocol v2 communication is command oriented. When first contacting a
> +server a list of capabilities will advertised. Some of these capabilities
s/will advertised/will be advertised/
> + Capability Advertisement
> +--------------------------
> +
> +A server which decides to communicate (based on a request from a client)
> +using protocol version 2, notifies the client by sending a version string
> +in its initial response followed by an advertisement of its capabilities.
> +Each capability is a key with an optional value. Clients must ignore all
> +unknown keys.
With have a convention in $GIT_DIR/index file format that's probably a
good thing to follow here: lowercase keys are optional, such unknown
keys can (and must) be ignored. Uppercase keys are mandatory. If a
client can't understand one of those keys, abort. This gives the
server a way to "select" clients and introduce incompatible changes if
we ever have to.
> Semantics of unknown values are left to the definition of
> +each key. Some capabilities will describe commands which can be requested
> +to be executed by the client.
> +
> + capability-advertisement = protocol-version
> + capability-list
> + flush-pkt
> +
> + protocol-version = PKT-LINE("version 2" LF)
> + capability-list = *capability
> + capability = PKT-LINE(key[=value] LF)
> +
> + key = 1*CHAR
> + value = 1*CHAR
> + CHAR = 1*(ALPHA / DIGIT / "-" / "_")
Is this a bit too restricted for "value"? Something like "." (e.g.
version) or "@" (I wonder if anybody will add an capability that
contains an email address). Unless there's a good reason to limit it,
should we just go full ascii (without control codes)?
> +A client then responds to select the command it wants with any particular
> +capabilities or arguments. There is then an optional section where the
> +client can provide any command specific parameters or queries.
> +
> + command-request = command
> + capability-list
> + (command-args)
> + flush-pkt
> + command = PKT-LINE("command=" key LF)
> + command-args = delim-pkt
> + *arg
> + arg = 1*CHAR
> +
> +The server will then check to ensure that the client's request is
> +comprised of a valid command as well as valid capabilities which were
> +advertised. If the request is valid the server will then execute the
> +command.
What happens when the request is not valid? Or..
> +When a command has finished
How does the client know a command has finished? Is it up to each
command design?
More or less related it bugs me that I have a translated git client,
but I still receive remote error messages in English. It's a hard
problem, but I'm hoping that we won't need to change the core protocol
to support that someday. Although we could make rule now that side
channel message could be sent in "printf"-like form, where the client
can translate the format string and substitutes placeholders with real
values afterward...
> a client can either request that another
> +command be executed or can terminate the connection by sending an empty
> +request consisting of just a flush-pkt.
> +
> + Capabilities
> +~~~~~~~~~~~~~~
> +
> +There are two different types of capabilities: normal capabilities,
> +which can be used to to convey information or alter the behavior of a
> +request, and command capabilities, which are the core actions that a
> +client wants to perform (fetch, push, etc).
> +
> + agent
> +-------
> +
> +The server can advertise the `agent` capability with a value `X` (in the
> +form `agent=X`) to notify the client that the server is running version
> +`X`. The client may optionally send its own agent string by including
> +the `agent` capability with a value `Y` (in the form `agent=Y`) in its
> +request to the server (but it MUST NOT do so if the server did not
> +advertise the agent capability). The `X` and `Y` strings may contain any
> +printable ASCII characters except space (i.e., the byte range 32 < x <
> +127), and are typically of the form "package/version" (e.g.,
> +"git/1.8.3.1"). The agent strings are purely informative for statistics
> +and debugging purposes, and MUST NOT be used to programmatically assume
> +the presence or absence of particular features.
> +
> + stateless-rpc
> +---------------
> +
> +If advertised, the `stateless-rpc` capability indicates that the server
> +supports running commands in a stateless-rpc mode, which means that a
> +command lasts for only a single request-response round.
> +
> +Normally a command can last for as many rounds as are required to
> +complete it (multiple for negotiation during fetch or no additional
> +trips in the case of ls-refs). If the client sends the `stateless-rpc`
> +capability with a value of `true` (in the form `stateless-rpc=true`)
> +then the invoked command must only last a single round.
Speaking of stateless-rpc, I remember last time this topic was brought
up, there was some discussion to kind of optimize it for http as well,
to fit the "client sends request, server responds data" model and
avoid too many round trips (ideally everything happens in one round
trip). Does it evolve to anything real? All the cool stuff happened
while I was away, sorry if this was discussed and settled.
--
Duy
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v2 12/27] serve: introduce git-serve
2018-01-26 10:39 ` Duy Nguyen@ 2018-02-27 5:46 ` Jonathan Nieder0 siblings, 0 replies; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-27 5:46 UTC (permalink / raw)
To: Duy Nguyen
Cc: Brandon Williams, Git Mailing List, Stefan Beller,
Junio C Hamano, Jeff King, Philip Oakley, stolee
Hi Duy,
Duy Nguyen wrote:
> On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams <bmwill@google.com> wrote:
>> + stateless-rpc
>> +---------------
>> +
>> +If advertised, the `stateless-rpc` capability indicates that the server
>> +supports running commands in a stateless-rpc mode, which means that a
>> +command lasts for only a single request-response round.
>> +
>> +Normally a command can last for as many rounds as are required to
>> +complete it (multiple for negotiation during fetch or no additional
>> +trips in the case of ls-refs). If the client sends the `stateless-rpc`
>> +capability with a value of `true` (in the form `stateless-rpc=true`)
>> +then the invoked command must only last a single round.
>
> Speaking of stateless-rpc, I remember last time this topic was brought
> up, there was some discussion to kind of optimize it for http as well,
> to fit the "client sends request, server responds data" model and
> avoid too many round trips (ideally everything happens in one round
> trip). Does it evolve to anything real? All the cool stuff happened
> while I was away, sorry if this was discussed and settled.
We have a few different ideas for improving negotiation. They were
speculative enough that we didn't want to make them part of the
baseline protocol v2. Feel free to poke me in a new thread. :)
Some teasers:
- allow both client and server to suggest commits in negotiation,
instead of just the client?
- send a bloom filter for the peer to filter their suggestions
against?
- send other basic information like maximum generation number or
maximum commit date?
- exponential backoff in negotiation instead of linear walking?
prioritizing ref tips? Imitating the bitmap selection algorithm?
- at the "end" of negotiation, sending a graph data structure instead
of a pack, to allow an extra round trip to produce a truly minimal
pack?
Those are some initial ideas, but it's also likely that someone can
come up with some other experiments to try, too. (E.g. we've looked
at various papers on set reconciliation, but they don't make enough
use of the graph structure to help much.)
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v2 00/27] protocol version 2
2018-01-25 23:58 ` [PATCH v2 00/27] " Brandon Williams
` (26 preceding siblings ...)
2018-01-25 23:58 ` [PATCH v2 27/27] remote-curl: implement stateless-connect command Brandon Williams
@ 2018-01-31 16:00 ` Derrick Stolee
2018-02-07 0:58 ` Brandon Williams
2018-02-01 19:40 ` Jeff Hostetler
2018-02-07 1:12 ` [PATCH v3 00/35] " Brandon Williams
29 siblings, 1 reply; 362+ messages in thread
From: Derrick Stolee @ 2018-01-31 16:00 UTC (permalink / raw)
To: Brandon Williams, git; +Cc: sbeller, gitster, peff, philipoakley, jrnieder
Sorry for chiming in with mostly nitpicks so late since sending this
version. Mostly, I tried to read it to see if I could understand the
scope of the patch and how this code worked before. It looks very
polished, so I the nits were the best I could do.
On 1/25/2018 6:58 PM, Brandon Williams wrote:
> Changes in v2:
> * Added documentation for fetch
> * changes #defines for state variables to be enums
> * couple code changes to pkt-line functions and documentation
> * Added unit tests for the git-serve binary as well as for ls-refs
I'm a fan of more unit-level testing, and I think that will be more
important as we go on with these multiple configuration options.
> Areas for improvement
> * Push isn't implemented, right now this is ok because if v2 is requested the
> server can just default to v0. Before this can be merged we may want to
> change how the client request a new protocol, and not allow for sending
> "version=2" when pushing even though the user has it configured. Or maybe
> its fine to just have an older client who doesn't understand how to push
> (and request v2) to die if the server tries to speak v2 at it.
>
> Fixing this essentially would just require piping through a bit more
> information to the function which ultimately runs connect (for both builtins
> and remote-curl)
Definitely save push for a later patch. Getting 'fetch' online did
require 'ls-refs' at the same time. Future reviews will be easier when
adding one command at a time.
>
> * I want to make sure that the docs are well written before this gets merged
> so I'm hoping that someone can do a through review on the docs themselves to
> make sure they are clear.
I made a comment in the docs about the architectural changes. While I
think a discussion on that topic would be valuable, I'm not sure that's
the point of the document (i.e. documenting what v2 does versus selling
the value of the patch). I thought the docs were clear for how the
commands work.
> * Right now there is a capability 'stateless-rpc' which essentially makes sure
> that a server command completes after a single round (this is to make sure
> http works cleanly). After talking with some folks it may make more sense
> to just have v2 be stateless in nature so that all commands terminate after
> a single round trip. This makes things a bit easier if a server wants to
> have ssh just be a proxy for http.
>
> One potential thing would be to flip this so that by default the protocol is
> stateless and if a server/command has a state-full mode that can be
> implemented as a capability at a later point. Thoughts?
At minimum, all commands should be designed with a "stateless first"
philosophy since a large number of users communicate via HTTP[S] and any
decisions that make stateless communication painful should be rejected.
> * Shallow repositories and shallow clones aren't supported yet. I'm working
> on it and it can be either added to v2 by default if people think it needs
> to be in there from the start, or we can add it as a capability at a later
> point.
I'm happy to say the following:
1. Shallow repositories should not be used for servers, since they
cannot service all requests.
2. Since v2 has easy capability features, I'm happy to leave shallow for
later. We will want to verify that a shallow clone command reverts to v1.
I fetched bw/protocol-v2 with tip 13c70148, built, set
'protocol.version=2' in the config, and tested fetches against GitHub
and VSTS just as a compatibility test. Everything worked just fine.
Is there an easy way to test the existing test suite for clone and fetch
using protocol v2 to make sure there are no regressions with
protocol.version=2 in the config?
Thanks,
-Stolee
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v2 00/27] protocol version 2
2018-01-31 16:00 ` [PATCH v2 00/27] protocol version 2 Derrick Stolee
@ 2018-02-07 0:58 ` Brandon Williams0 siblings, 0 replies; 362+ messages in thread
From: Brandon Williams @ 2018-02-07 0:58 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, sbeller, gitster, peff, philipoakley, jrnieder
On 01/31, Derrick Stolee wrote:
> Sorry for chiming in with mostly nitpicks so late since sending this
> version. Mostly, I tried to read it to see if I could understand the scope
> of the patch and how this code worked before. It looks very polished, so I
> the nits were the best I could do.
>
> On 1/25/2018 6:58 PM, Brandon Williams wrote:
> > Changes in v2:
> > * Added documentation for fetch
> > * changes #defines for state variables to be enums
> > * couple code changes to pkt-line functions and documentation
> > * Added unit tests for the git-serve binary as well as for ls-refs
>
> I'm a fan of more unit-level testing, and I think that will be more
> important as we go on with these multiple configuration options.
>
> > Areas for improvement
> > * Push isn't implemented, right now this is ok because if v2 is requested the
> > server can just default to v0. Before this can be merged we may want to
> > change how the client request a new protocol, and not allow for sending
> > "version=2" when pushing even though the user has it configured. Or maybe
> > its fine to just have an older client who doesn't understand how to push
> > (and request v2) to die if the server tries to speak v2 at it.
> >
> > Fixing this essentially would just require piping through a bit more
> > information to the function which ultimately runs connect (for both builtins
> > and remote-curl)
>
> Definitely save push for a later patch. Getting 'fetch' online did require
> 'ls-refs' at the same time. Future reviews will be easier when adding one
> command at a time.
>
> >
> > * I want to make sure that the docs are well written before this gets merged
> > so I'm hoping that someone can do a through review on the docs themselves to
> > make sure they are clear.
>
> I made a comment in the docs about the architectural changes. While I think
> a discussion on that topic would be valuable, I'm not sure that's the point
> of the document (i.e. documenting what v2 does versus selling the value of
> the patch). I thought the docs were clear for how the commands work.
>
> > * Right now there is a capability 'stateless-rpc' which essentially makes sure
> > that a server command completes after a single round (this is to make sure
> > http works cleanly). After talking with some folks it may make more sense
> > to just have v2 be stateless in nature so that all commands terminate after
> > a single round trip. This makes things a bit easier if a server wants to
> > have ssh just be a proxy for http.
> >
> > One potential thing would be to flip this so that by default the protocol is
> > stateless and if a server/command has a state-full mode that can be
> > implemented as a capability at a later point. Thoughts?
>
> At minimum, all commands should be designed with a "stateless first"
> philosophy since a large number of users communicate via HTTP[S] and any
> decisions that make stateless communication painful should be rejected.
I agree with this and my next version will run with this philosophy in
mind (v2 will be stateless by default).
>
> > * Shallow repositories and shallow clones aren't supported yet. I'm working
> > on it and it can be either added to v2 by default if people think it needs
> > to be in there from the start, or we can add it as a capability at a later
> > point.
>
> I'm happy to say the following:
>
> 1. Shallow repositories should not be used for servers, since they cannot
> service all requests.
>
> 2. Since v2 has easy capability features, I'm happy to leave shallow for
> later. We will want to verify that a shallow clone command reverts to v1.
>
>
> I fetched bw/protocol-v2 with tip 13c70148, built, set 'protocol.version=2'
> in the config, and tested fetches against GitHub and VSTS just as a
> compatibility test. Everything worked just fine.
>
> Is there an easy way to test the existing test suite for clone and fetch
> using protocol v2 to make sure there are no regressions with
> protocol.version=2 in the config?
Yes there already exist interop tests for testing the addition of
requesting a new protocol at //t/interop/i5700-protocol-transition.sh
>
> Thanks,
> -Stolee
--
Brandon Williams
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader
2018-02-07 1:12 ` [PATCH v3 02/35] pkt-line: introduce struct packet_reader Brandon Williams
2018-02-13 0:49 ` Jonathan Nieder@ 2018-02-27 5:57 ` Jonathan Nieder
2018-02-27 6:12 ` Jonathan Nieder1 sibling, 1 reply; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-27 5:57 UTC (permalink / raw)
To: Brandon Williams; +Cc: git, sbeller, peff, gitster, stolee, git, pclouds
Hi,
Brandon Williams wrote:
> Sometimes it is advantageous to be able to peek the next packet line
> without consuming it (e.g. to be able to determine the protocol version
> a server is speaking). In order to do that introduce 'struct
> packet_reader' which is an abstraction around the normal packet reading
> logic. This enables a caller to be able to peek a single line at a time
> using 'packet_reader_peek()' and having a caller consume a line by
> calling 'packet_reader_read()'.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
> pkt-line.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> pkt-line.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 117 insertions(+)
I like it!
The questions and nits from
https://public-inbox.org/git/20180213004937.GB42272@aiede.svl.corp.google.com/
still apply. In particular, the ownership of the buffers inside the
'struct packet_reader' is still unclear; could the packet_reader create
its own (strbuf) buffers so that the contract around them (who is allowed
to write to them; who is responsible for freeing them) is more obvious?
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader
2018-02-27 5:57 ` Jonathan Nieder@ 2018-02-27 6:12 ` Jonathan Nieder0 siblings, 0 replies; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-27 6:12 UTC (permalink / raw)
To: Brandon Williams; +Cc: git, sbeller, peff, gitster, stolee, git, pclouds
Jonathan Nieder wrote:
> Brandon Williams wrote:
>> Sometimes it is advantageous to be able to peek the next packet line
>> without consuming it (e.g. to be able to determine the protocol version
>> a server is speaking). In order to do that introduce 'struct
>> packet_reader' which is an abstraction around the normal packet reading
>> logic. This enables a caller to be able to peek a single line at a time
>> using 'packet_reader_peek()' and having a caller consume a line by
>> calling 'packet_reader_read()'.
>>
>> Signed-off-by: Brandon Williams <bmwill@google.com>
>> ---
>> pkt-line.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> pkt-line.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 117 insertions(+)
>
> I like it!
>
> The questions and nits from
> https://public-inbox.org/git/20180213004937.GB42272@aiede.svl.corp.google.com/
> still apply. In particular, the ownership of the buffers inside the
> 'struct packet_reader' is still unclear; could the packet_reader create
> its own (strbuf) buffers so that the contract around them (who is allowed
> to write to them; who is responsible for freeing them) is more obvious?
Just to be clear: I sent that review after you sent this patch, so
there should not have been any reason for me to expect the q's and
nits to magically not apply. ;-)
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 03/35] pkt-line: add delim packet support
2018-02-07 1:12 ` [PATCH v3 03/35] pkt-line: add delim packet support Brandon Williams
@ 2018-02-22 19:13 ` Stefan Beller
2018-02-22 19:37 ` Brandon Williams0 siblings, 1 reply; 362+ messages in thread
From: Stefan Beller @ 2018-02-22 19:13 UTC (permalink / raw)
To: Brandon Williams
Cc: git, Jeff King, Junio C Hamano, Jonathan Nieder, Derrick Stolee,
Jeff Hostetler, Duy Nguyen
On Tue, Feb 6, 2018 at 5:12 PM, Brandon Williams <bmwill@google.com> wrote:
> One of the design goals of protocol-v2 is to improve the semantics of
> flush packets. Currently in protocol-v1, flush packets are used both to
> indicate a break in a list of packet lines as well as an indication that
> one side has finished speaking. This makes it particularly difficult
> to implement proxies as a proxy would need to completely understand git
> protocol instead of simply looking for a flush packet.
>
> To do this, introduce the special deliminator packet '0001'. A delim
> packet can then be used as a deliminator between lists of packet lines
> while flush packets can be reserved to indicate the end of a response.
Please mention where this can be found in the documentation.
(Defer to later patch?)
As the commit message states, this is only to be used for v2,
in v0 it is still an illegal pkt.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
The code is
Reviewed-by: Stefan Beller <sbeller@google.com>
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 03/35] pkt-line: add delim packet support
2018-02-22 19:13 ` Stefan Beller@ 2018-02-22 19:37 ` Brandon Williams0 siblings, 0 replies; 362+ messages in thread
From: Brandon Williams @ 2018-02-22 19:37 UTC (permalink / raw)
To: Stefan Beller
Cc: git, Jeff King, Junio C Hamano, Jonathan Nieder, Derrick Stolee,
Jeff Hostetler, Duy Nguyen
On 02/22, Stefan Beller wrote:
> On Tue, Feb 6, 2018 at 5:12 PM, Brandon Williams <bmwill@google.com> wrote:
> > One of the design goals of protocol-v2 is to improve the semantics of
> > flush packets. Currently in protocol-v1, flush packets are used both to
> > indicate a break in a list of packet lines as well as an indication that
> > one side has finished speaking. This makes it particularly difficult
> > to implement proxies as a proxy would need to completely understand git
> > protocol instead of simply looking for a flush packet.
> >
> > To do this, introduce the special deliminator packet '0001'. A delim
> > packet can then be used as a deliminator between lists of packet lines
> > while flush packets can be reserved to indicate the end of a response.
>
> Please mention where this can be found in the documentation.
> (Defer to later patch?)
Yeah the documentation does get added in a future patch, I'll make a
comment to that effect.
> As the commit message states, this is only to be used for v2,
> in v0 it is still an illegal pkt.
>
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
>
> The code is
> Reviewed-by: Stefan Beller <sbeller@google.com>
--
Brandon Williams
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-07 1:12 ` [PATCH v3 04/35] upload-pack: convert to a builtin Brandon Williams
@ 2018-02-21 21:44 ` Jonathan Tan
2018-02-22 9:58 ` Jeff King0 siblings, 1 reply; 362+ messages in thread
From: Jonathan Tan @ 2018-02-21 21:44 UTC (permalink / raw)
To: Brandon Williams
Cc: git, sbeller, peff, gitster, jrnieder, stolee, git, pclouds
On Tue, 6 Feb 2018 17:12:41 -0800
Brandon Williams <bmwill@google.com> wrote:
> In order to allow for code sharing with the server-side of fetch in
> protocol-v2 convert upload-pack to be a builtin.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
As Stefan mentioned in [1], also mention in the commit message that this
means that the "git-upload-pack" invocation gains additional
capabilities (for example, invoking a pager for --help).
Having said that, the main purpose of this patch seems to be to libify
upload-pack, and the move to builtin is just a way of putting the
program somewhere - we could have easily renamed upload-pack.c and
created a new upload-pack.c containing the main(), preserving the
non-builtin-ness of upload-pack, while still gaining the benefits of
libifying upload-pack.
If the community does want to make upload-pack a builtin, I would write
the commit message this way:
upload-pack: libify
Libify upload-pack. The main() function is moved to
builtin/upload-pack.c, thus making upload-pack a builtin. Note that
this means that "git-upload-pack" gains functionality such as the
ability to invoke a pager when passed "--help".
And if not:
upload-pack: libify
Libify upload-pack by moving most of the functionality in
upload-pack.c into a file upload-pack-lib.c (or some other name),
to be used in subsequent patches.
[1] https://public-inbox.org/git/CAGZ79kb2=uU0_K8wr27gNdNX-T+P+7gVdgc5EBdYc3zBobsR8w@mail.gmail.com/> -static void upload_pack(void)
> -{
> - struct string_list symref = STRING_LIST_INIT_DUP;
> -
> - head_ref_namespaced(find_symref, &symref);
> -
> - if (advertise_refs || !stateless_rpc) {
> - reset_timeout();
> - head_ref_namespaced(send_ref, &symref);
> - for_each_namespaced_ref(send_ref, &symref);
> - advertise_shallow_grafts(1);
> - packet_flush(1);
> - } else {
> - head_ref_namespaced(check_ref, NULL);
> - for_each_namespaced_ref(check_ref, NULL);
> - }
> - string_list_clear(&symref, 1);
> - if (advertise_refs)
> - return;
> -
> - receive_needs();
> - if (want_obj.nr) {
> - get_common_commits();
> - create_pack_file();
> - }
> -}
I see that this function had to be moved to the bottom because it now
also needs to make use of functions like upload_pack_config() - that's
fine.
> +struct upload_pack_options {
> + int stateless_rpc;
> + int advertise_refs;
> + unsigned int timeout;
> + int daemon_mode;
> +};
I would have expected "unsigned stateless_rpc : 1" etc., but I see that
this makes it easier to use with OPT_BOOL (which needs us to pass it a
pointer-to-int).
As for what existing code does, files like fetch-pack and diff use
"unsigned : 1", but they also process arguments without OPT_, so I don't
think they are relevant.
I think that we should decide if we're going to prefer "unsigned : 1" or
"int" for flags in new code. Personally, I prefer "unsigned : 1"
(despite the slight inconvenience in that argument parsers will need to
declare their own temporary "int" and then assign its contents to the
options struct) because of the stronger type, but I'm OK either way.
Whatever the decision, I don't think it needs to block the review of
this patch set.
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-21 21:44 ` Jonathan Tan@ 2018-02-22 9:58 ` Jeff King
2018-02-22 18:07 ` Brandon Williams0 siblings, 1 reply; 362+ messages in thread
From: Jeff King @ 2018-02-22 9:58 UTC (permalink / raw)
To: Jonathan Tan
Cc: Brandon Williams, git, sbeller, gitster, jrnieder, stolee, git, pclouds
On Wed, Feb 21, 2018 at 01:44:22PM -0800, Jonathan Tan wrote:
> On Tue, 6 Feb 2018 17:12:41 -0800
> Brandon Williams <bmwill@google.com> wrote:
>
> > In order to allow for code sharing with the server-side of fetch in
> > protocol-v2 convert upload-pack to be a builtin.
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
>
> As Stefan mentioned in [1], also mention in the commit message that this
> means that the "git-upload-pack" invocation gains additional
> capabilities (for example, invoking a pager for --help).
And possibly respecting pager.upload-pack, which would violate our rule
that it is safe to run upload-pack in untrusted repositories.
(This actually doesn't work right now because pager.* is broken for
builtins that don't specify RUN_SETUP; but I think with the fixes last
year to the config code, we can now drop that restriction).
Obviously we can work around this with an extra RUN_NO_PAGER_CONFIG
flag. But I think it points to a general danger in making upload-pack a
builtin. I'm not sure what other features it would want to avoid (or
what might grow in the future).
> Having said that, the main purpose of this patch seems to be to libify
> upload-pack, and the move to builtin is just a way of putting the
> program somewhere - we could have easily renamed upload-pack.c and
> created a new upload-pack.c containing the main(), preserving the
> non-builtin-ness of upload-pack, while still gaining the benefits of
> libifying upload-pack.
Yeah, this seems like a better route to me.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 9:58 ` Jeff King@ 2018-02-22 18:07 ` Brandon Williams
2018-02-22 18:14 ` Jeff King0 siblings, 1 reply; 362+ messages in thread
From: Brandon Williams @ 2018-02-22 18:07 UTC (permalink / raw)
To: Jeff King
Cc: Jonathan Tan, git, sbeller, gitster, jrnieder, stolee, git, pclouds
On 02/22, Jeff King wrote:
> On Wed, Feb 21, 2018 at 01:44:22PM -0800, Jonathan Tan wrote:
>
> > On Tue, 6 Feb 2018 17:12:41 -0800
> > Brandon Williams <bmwill@google.com> wrote:
> >
> > > In order to allow for code sharing with the server-side of fetch in
> > > protocol-v2 convert upload-pack to be a builtin.
> > >
> > > Signed-off-by: Brandon Williams <bmwill@google.com>
> >
> > As Stefan mentioned in [1], also mention in the commit message that this
> > means that the "git-upload-pack" invocation gains additional
> > capabilities (for example, invoking a pager for --help).
>
> And possibly respecting pager.upload-pack, which would violate our rule
> that it is safe to run upload-pack in untrusted repositories.
And this isn't an issue with receive-pack because this same guarantee
doesn't exist?
>
> (This actually doesn't work right now because pager.* is broken for
> builtins that don't specify RUN_SETUP; but I think with the fixes last
> year to the config code, we can now drop that restriction).
>
> Obviously we can work around this with an extra RUN_NO_PAGER_CONFIG
> flag. But I think it points to a general danger in making upload-pack a
> builtin. I'm not sure what other features it would want to avoid (or
> what might grow in the future).
>
> > Having said that, the main purpose of this patch seems to be to libify
> > upload-pack, and the move to builtin is just a way of putting the
> > program somewhere - we could have easily renamed upload-pack.c and
> > created a new upload-pack.c containing the main(), preserving the
> > non-builtin-ness of upload-pack, while still gaining the benefits of
> > libifying upload-pack.
>
> Yeah, this seems like a better route to me.
>
> -Peff
--
Brandon Williams
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 18:14 ` Jeff King@ 2018-02-22 19:38 ` Jonathan Nieder
2018-02-22 20:19 ` Jeff King0 siblings, 1 reply; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-22 19:38 UTC (permalink / raw)
To: Jeff King
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
Hi,
Jeff King wrote:
> On Thu, Feb 22, 2018 at 10:07:15AM -0800, Brandon Williams wrote:
>> On 02/22, Jeff King wrote:
>>> On Wed, Feb 21, 2018 at 01:44:22PM -0800, Jonathan Tan wrote:
>>>> On Tue, 6 Feb 2018 17:12:41 -0800
>>>> Brandon Williams <bmwill@google.com> wrote:
>>>>> In order to allow for code sharing with the server-side of fetch in
>>>>> protocol-v2 convert upload-pack to be a builtin.
[...]
>>>> As Stefan mentioned in [1], also mention in the commit message that this
>>>> means that the "git-upload-pack" invocation gains additional
>>>> capabilities (for example, invoking a pager for --help).
>>>
>>> And possibly respecting pager.upload-pack, which would violate our rule
>>> that it is safe to run upload-pack in untrusted repositories.
>>
>> And this isn't an issue with receive-pack because this same guarantee
>> doesn't exist?
>
> Yes, exactly (which is confusing and weird, yes, but that's how it is).
To be clear, which of the following are you (most) worried about?
1. being invoked with --help and spawning a pager
2. receiving and acting on options between 'git' and 'upload-pack'
3. repository discovery
4. pager config
5. alias discovery
6. increased code surface / unknown threats
For (1), "--help" has to be the first argument. "git daemon" passes
--strict so it doesn't happen there. "git http-backend" passes
--stateless-rpc so it doesn't happen there. "git shell" sanitizes
input to avoid it happening there.
A custom setup could provide their own entry point that doesn't do
such sanitization. I think that in some sense it's out of our hands,
but it would be nice to harden as described upthread.
For (2), I am having trouble imagining a setup where it would happen.
upload-pack doesn't have the RUN_SETUP or RUN_SETUP_GENTLY flag set,
so (3) doesn't apply.
Although in most setups the user does not control the config files on
a server, item (4) looks like a real issue worth solving. I think we
should introduce a flag to skip looking for pager config. We could
use it for receive-pack, too.
Builtins are handled before aliases, so (5) doesn't apply.
(6) is a real issue: it is why "git shell" is not a builtin, for
example. But I would rather that we use appropriate sanitization
before upload-pack is invoked than live in fear. git upload-pack is
sufficiently complicated that I don't think the git.c wrapper
increases the complexity by a significant amount.
That leaves me with a personal answer of only being worried about (4)
and not the rest. What do you think? Is one of the other items I
listed above worrisome, or is there another item I missed?
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 19:38 ` Jonathan Nieder@ 2018-02-22 20:19 ` Jeff King
2018-02-22 20:21 ` Jeff King
2018-02-22 21:24 ` Jonathan Nieder0 siblings, 2 replies; 362+ messages in thread
From: Jeff King @ 2018-02-22 20:19 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Thu, Feb 22, 2018 at 11:38:14AM -0800, Jonathan Nieder wrote:
> >>> And possibly respecting pager.upload-pack, which would violate our rule
> >>> that it is safe to run upload-pack in untrusted repositories.
> >>
> >> And this isn't an issue with receive-pack because this same guarantee
> >> doesn't exist?
> >
> > Yes, exactly (which is confusing and weird, yes, but that's how it is).
>
> To be clear, which of the following are you (most) worried about?
>
> 1. being invoked with --help and spawning a pager
> 2. receiving and acting on options between 'git' and 'upload-pack'
> 3. repository discovery
> 4. pager config
> 5. alias discovery
> 6. increased code surface / unknown threats
My immediate concern is (4). But my greater concern is that people who
work on git.c should not have to worry about accidentally violating this
principle when they add a new feature or config option.
In other words, it seems like an accident waiting to happen. I'd be more
amenable to it if there was some compelling reason for it to be a
builtin, but I don't see one listed in the commit message. I see only
"let's make it easier to share the code", which AFAICT is equally served
by just lib-ifying the code and calling it from the standalone
upload-pack.c.
> Although in most setups the user does not control the config files on
> a server, item (4) looks like a real issue worth solving. I think we
> should introduce a flag to skip looking for pager config. We could
> use it for receive-pack, too.
There's not much point for receive-pack. It respects hooks, so any
security ship has already sailed there.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 20:19 ` Jeff King@ 2018-02-22 20:21 ` Jeff King
2018-02-22 21:26 ` Jonathan Nieder
2018-02-23 21:09 ` Brandon Williams
2018-02-22 21:24 ` Jonathan Nieder1 sibling, 2 replies; 362+ messages in thread
From: Jeff King @ 2018-02-22 20:21 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Thu, Feb 22, 2018 at 03:19:40PM -0500, Jeff King wrote:
> > To be clear, which of the following are you (most) worried about?
> >
> > 1. being invoked with --help and spawning a pager
> > 2. receiving and acting on options between 'git' and 'upload-pack'
> > 3. repository discovery
> > 4. pager config
> > 5. alias discovery
> > 6. increased code surface / unknown threats
>
> My immediate concern is (4). But my greater concern is that people who
> work on git.c should not have to worry about accidentally violating this
> principle when they add a new feature or config option.
>
> In other words, it seems like an accident waiting to happen. I'd be more
> amenable to it if there was some compelling reason for it to be a
> builtin, but I don't see one listed in the commit message. I see only
> "let's make it easier to share the code", which AFAICT is equally served
> by just lib-ifying the code and calling it from the standalone
> upload-pack.c.
By the way, any decision here would presumably need to be extended to
git-serve, etc. The current property is that it's safe to fetch from an
untrusted repository, even over ssh. If we're keeping that for protocol
v1, we'd want it to apply to protocol v2, as well.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 20:21 ` Jeff King@ 2018-02-22 21:26 ` Jonathan Nieder
2018-02-22 21:44 ` Jeff King
2018-02-23 21:09 ` Brandon Williams1 sibling, 1 reply; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-22 21:26 UTC (permalink / raw)
To: Jeff King
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
Jeff King wrote:
> The current property is that it's safe to fetch from an
> untrusted repository, even over ssh. If we're keeping that for protocol
> v1, we'd want it to apply to protocol v2, as well.
Ah, this is what I had been missing (the non-ssh case).
I see your point. I think we need to fix the pager config issue and add
some clarifying documentation to git.c so that people know what to look
out for.
Keep in mind that git upload-archive (a read-only command, just like
git upload-pack) also already has the same issues.
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-03-12 22:43 ` Jonathan Nieder@ 2018-03-12 23:28 ` Jeff King
2018-03-12 23:37 ` Jonathan Nieder0 siblings, 1 reply; 362+ messages in thread
From: Jeff King @ 2018-03-12 23:28 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Mon, Mar 12, 2018 at 03:43:55PM -0700, Jonathan Nieder wrote:
> Hi,
>
> Jeff King wrote:
> > On Thu, Feb 22, 2018 at 01:26:34PM -0800, Jonathan Nieder wrote:
>
> >> Keep in mind that git upload-archive (a read-only command, just like
> >> git upload-pack) also already has the same issues.
> >
> > Yuck. I don't think we've ever made a historical promise about that. But
> > then, I don't think the promise about upload-pack has ever really been
> > documented, except in mailing list discussions.
>
> Sorry to revive this old side-thread. Good news: for a dashed command
> like git-upload-archive, the pager selection code only runs for
> commands with RUN_SETUP or RUN_SETUP_GENTLY:
>
> if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
> !(p->option & DELAY_PAGER_CONFIG))
> use_pager = check_pager_config(p->cmd);
>
> None of upload-pack, receive-pack,git-serve, or upload-archive set
> those flags, so we (narrowly) escape trouble here.
Right, I saw that earlier. But I actually think that is stale from the
days when it wasn't safe to call check_pager_config() too early. So I
could very well see somebody removing it and causing a spooky
vulnerability at a distance.
> Later today I should be able to send a cleanup to make the behavior
> more obvious.
Thanks. I'm still on the fence over the whole builtin concept, but
certainly a "don't ever turn on a pager" flag seems like a reasonable
thing to have.
An alternative approach is some kind of global for "don't trust the
local repo" flag. That could be respected from very low-level code
(e.g., where we read and/or respect the pager command, but also in other
places like hooks, other config that runs arbitrary commands, etc). And
then upload-pack would set that to "do not trust", and other programs
would default to "trust".
We could even give it an environment variable, which would allow
something like:
tar xf maybe-evil.git.tar
cd maybe-evil
export GIT_TRUST_REPO=false
git log
without worrying about pager.log config, etc. My two concerns with this
approach would be:
1. We have to manually annotate any "dangerous" code to act more
safely when it sees the flag. Which means it's highly likely to
a spot, or to add a new feature which doesn't respect it. And
suddenly that's a security hole. So I'm concerned it may create a
false sense of security and actually make things worse.
2. As a global, I'm not sure how it would interact with multi-repo
processes like submodules. In theory it ought to go into the
repository struct, but it would often need to be set globally
before we've even discovered the repo.
That might be fine, though. It's really more about context than
about a specific repo (so you may say "don't trust this repo", and
that extends to any submodules you happen to access, too).
I dunno. I think (2) is probably OK, but (1) really gives me pause.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-03-12 23:28 ` Jeff King@ 2018-03-12 23:37 ` Jonathan Nieder
2018-03-12 23:52 ` Jeff King0 siblings, 1 reply; 362+ messages in thread
From: Jonathan Nieder @ 2018-03-12 23:37 UTC (permalink / raw)
To: Jeff King
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
Jeff King wrote:
> We could even give it an environment variable, which would allow
> something like:
>
> tar xf maybe-evil.git.tar
> cd maybe-evil
> export GIT_TRUST_REPO=false
> git log
Interesting idea. Putting it in an envvar means it gets inherited by
child processes, which if I understand you correctly is a good thing.
[...]
> 1. We have to manually annotate any "dangerous" code to act more
> safely when it sees the flag. Which means it's highly likely to
> a spot, or to add a new feature which doesn't respect it. And
> suddenly that's a security hole. So I'm concerned it may create a
> false sense of security and actually make things worse.
As an internal implementation detail, this is so obviously fragile
that it wouldn't give me any feeling of security. ;-) So it should be
strictly an improvement.
As a public-facing feature, I suspect it's a bad idea for exactly that
reason.
FWIW for pager specifically I am going for a whitelisting approach:
new commands would have to explicitly set ALLOW_PAGER if they want to
respect pager config. That doesn't guarantee people think about it
again as things evolve but it should at least help with getting the
right setting for new plumbing.
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-03-12 23:37 ` Jonathan Nieder@ 2018-03-12 23:52 ` Jeff King0 siblings, 0 replies; 362+ messages in thread
From: Jeff King @ 2018-03-12 23:52 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Mon, Mar 12, 2018 at 04:37:47PM -0700, Jonathan Nieder wrote:
> Jeff King wrote:
>
> > We could even give it an environment variable, which would allow
> > something like:
> >
> > tar xf maybe-evil.git.tar
> > cd maybe-evil
> > export GIT_TRUST_REPO=false
> > git log
> [...]
> As an internal implementation detail, this is so obviously fragile
> that it wouldn't give me any feeling of security. ;-) So it should be
> strictly an improvement.
>
> As a public-facing feature, I suspect it's a bad idea for exactly that
> reason.
So that pretty much kills off the GIT_TRUST_REPO idea, I guess.
> FWIW for pager specifically I am going for a whitelisting approach:
> new commands would have to explicitly set ALLOW_PAGER if they want to
> respect pager config. That doesn't guarantee people think about it
> again as things evolve but it should at least help with getting the
> right setting for new plumbing.
I suspect we'd be about as well off with the "don't trust the repo"
internal flag. Touching the ALLOW_PAGER setup code is about as likely to
set off red flags for the developers (or reviewers) as code that checks
the "trust" flag.
Forcing a whitelist on ALLOW_PAGER _is_ more likely to catch people
adding new commands. But I don't think we actually want to add more
commands to the "safe to run in a malicious repo" list. It's already a
slightly sketchy concept. This is really all about upload-pack and its
existing promises.
But ALLOW_PAGER would _just_ fix the pager issue. When we inevitably
find another problem spot, it won't help us there. But a global "trust"
flag might.
I dunno. I guess I'm OK with either approach, but it seems like the
global trust flag has more room to grow.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 20:21 ` Jeff King
2018-02-22 21:26 ` Jonathan Nieder@ 2018-02-23 21:09 ` Brandon Williams
2018-03-03 4:24 ` Jeff King1 sibling, 1 reply; 362+ messages in thread
From: Brandon Williams @ 2018-02-23 21:09 UTC (permalink / raw)
To: Jeff King
Cc: Jonathan Nieder, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On 02/22, Jeff King wrote:
> On Thu, Feb 22, 2018 at 03:19:40PM -0500, Jeff King wrote:
>
> > > To be clear, which of the following are you (most) worried about?
> > >
> > > 1. being invoked with --help and spawning a pager
> > > 2. receiving and acting on options between 'git' and 'upload-pack'
> > > 3. repository discovery
> > > 4. pager config
> > > 5. alias discovery
> > > 6. increased code surface / unknown threats
> >
> > My immediate concern is (4). But my greater concern is that people who
> > work on git.c should not have to worry about accidentally violating this
> > principle when they add a new feature or config option.
> >
> > In other words, it seems like an accident waiting to happen. I'd be more
> > amenable to it if there was some compelling reason for it to be a
> > builtin, but I don't see one listed in the commit message. I see only
> > "let's make it easier to share the code", which AFAICT is equally served
> > by just lib-ifying the code and calling it from the standalone
> > upload-pack.c.
>
> By the way, any decision here would presumably need to be extended to
> git-serve, etc. The current property is that it's safe to fetch from an
> untrusted repository, even over ssh. If we're keeping that for protocol
> v1, we'd want it to apply to protocol v2, as well.
>
> -Peff
This may be more complicated. Right now (for backward compatibility)
all fetches for v2 are issued to the upload-pack endpoint. So even
though I've introduced git-serve it doesn't have requests issued to it
and no requests can be issued to it currently (support isn't added to
http-backend or git-daemon). This just means that the command already
exists to make it easy for testing specific v2 stuff and if we want to
expose it as an endpoint (like when we have a brand new server command
that is completely incompatible with v1) its already there and support
just needs to be plumbed in.
This whole notion of treating upload-pack differently from receive-pack
has bad consequences for v2 though. The idea for v2 is to be able to
run any number of commands via the same endpoint, so at the end of the
day the endpoint you used is irrelevant. So you could issue both fetch
and push commands via the same endpoint in v2 whether its git-serve,
receive-pack, or upload-pack. So really, like Jonathan has said
elsewhere, we need to figure out how to be ok with having receive-pack
and upload-pack builtins, or having neither of them builtins, because it
doesn't make much sense for v2 to straddle that line. I mean you could
do some complicated advertising of commands based on the endpoint you
hit, but then what does that mean if you're hitting the git-serve
endpoint where you should presumably be able to do any operation.
--
Brandon Williams
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-23 21:09 ` Brandon Williams@ 2018-03-03 4:24 ` Jeff King0 siblings, 0 replies; 362+ messages in thread
From: Jeff King @ 2018-03-03 4:24 UTC (permalink / raw)
To: Brandon Williams
Cc: Jonathan Nieder, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Fri, Feb 23, 2018 at 01:09:04PM -0800, Brandon Williams wrote:
> > By the way, any decision here would presumably need to be extended to
> > git-serve, etc. The current property is that it's safe to fetch from an
> > untrusted repository, even over ssh. If we're keeping that for protocol
> > v1, we'd want it to apply to protocol v2, as well.
>
> This may be more complicated. Right now (for backward compatibility)
> all fetches for v2 are issued to the upload-pack endpoint. So even
> though I've introduced git-serve it doesn't have requests issued to it
> and no requests can be issued to it currently (support isn't added to
> http-backend or git-daemon). This just means that the command already
> exists to make it easy for testing specific v2 stuff and if we want to
> expose it as an endpoint (like when we have a brand new server command
> that is completely incompatible with v1) its already there and support
> just needs to be plumbed in.
>
> This whole notion of treating upload-pack differently from receive-pack
> has bad consequences for v2 though. The idea for v2 is to be able to
> run any number of commands via the same endpoint, so at the end of the
> day the endpoint you used is irrelevant. So you could issue both fetch
> and push commands via the same endpoint in v2 whether its git-serve,
> receive-pack, or upload-pack. So really, like Jonathan has said
> elsewhere, we need to figure out how to be ok with having receive-pack
> and upload-pack builtins, or having neither of them builtins, because it
> doesn't make much sense for v2 to straddle that line.
It seems like it would be OK if the whole code path of git-serve
invoking upload-pack happened without being a builtin, even if it would
be possible to run a builtin receive-pack from that same (non-builtin)
git-serve.
Remember that the client is driving the whole operation here, and we can
assume that git-serve is operating on the client's behalf. So a client
who chooses not to trigger receive-pack would be fine.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 20:19 ` Jeff King
2018-02-22 20:21 ` Jeff King@ 2018-02-22 21:24 ` Jonathan Nieder
2018-02-22 21:44 ` Jeff King1 sibling, 1 reply; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-22 21:24 UTC (permalink / raw)
To: Jeff King
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
Hi,
Jeff King wrote:
> On Thu, Feb 22, 2018 at 11:38:14AM -0800, Jonathan Nieder wrote:
>> To be clear, which of the following are you (most) worried about?
>>
>> 1. being invoked with --help and spawning a pager
>> 2. receiving and acting on options between 'git' and 'upload-pack'
>> 3. repository discovery
>> 4. pager config
>> 5. alias discovery
>> 6. increased code surface / unknown threats
>
> My immediate concern is (4).
Thanks for clarifying.
> But my greater concern is that people who
> work on git.c should not have to worry about accidentally violating this
> principle when they add a new feature or config option.
That sounds like a combination of (6) and insufficient documentation
or tests. Ideas for how we can help prevent such accidents?
> In other words, it seems like an accident waiting to happen. I'd be more
> amenable to it if there was some compelling reason for it to be a
> builtin, but I don't see one listed in the commit message. I see only
> "let's make it easier to share the code", which AFAICT is equally served
> by just lib-ifying the code and calling it from the standalone
> upload-pack.c.
If we have so little control of the common code used by git commands
that could be invoked by a remote user, I think we're in trouble
already. I don't think being a builtin vs not makes that
significantly different, since there are plenty of builtins that can
be triggered by remote users. Further, if we have so little control
over the security properties of git.c, what hope do we have of making
the rest of libgit.a usable in secure code?
In other words, having to pay more attention to the git wrapper from a
security pov actually feels to me like a *good* thing. The git
wrapper is the entry point to almost all git commands. If it is an
accident waiting to happen, then anything that calls git commands is
already an accident waiting to happen. So how can we make it not an
accident waiting to happen? :)
>> Although in most setups the user does not control the config files on
>> a server, item (4) looks like a real issue worth solving. I think we
>> should introduce a flag to skip looking for pager config. We could
>> use it for receive-pack, too.
>
> There's not much point for receive-pack. It respects hooks, so any
> security ship has already sailed there.
Yet there are plenty of cases where people who can push are not
supposed to have root privilege. I am not worried about hooks
specifically (although the changes described at [1] might help and I
still plan to work on those) but I am worried about e.g. commandline
injection issues. I don't think we can treat receive-pack as out of
scope.
And to be clear, I don't think you were saying receive-pack *is* out
of scope. But you seem to be trying to draw some boundary, where I
only see something fuzzier (e.g. if a bug only applies to
receive-pack, then that certainly decreases its impact, but it doesn't
make the impact go away).
Thanks,
Jonathan
[1] https://public-inbox.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 21:24 ` Jonathan Nieder@ 2018-02-22 21:44 ` Jeff King
2018-02-22 22:21 ` Jeff King0 siblings, 1 reply; 362+ messages in thread
From: Jeff King @ 2018-02-22 21:44 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Thu, Feb 22, 2018 at 01:24:02PM -0800, Jonathan Nieder wrote:
> > But my greater concern is that people who
> > work on git.c should not have to worry about accidentally violating this
> > principle when they add a new feature or config option.
>
> That sounds like a combination of (6) and insufficient documentation
> or tests. Ideas for how we can help prevent such accidents?
I don't think it's insufficient tests. How can we test against some
problem in the untrusted repository when the feature that would trigger
it has not been written yet?
E.g., imagine we begin to allow alias.* to override command names in
git.c. Now suddenly setting "alias.upload-pack" is a vulnerability.
Should we add a test for that _now_ as a precaution? I don't think so,
because we can't guess what new features are going to be added. So we'd
be lucky if such a test actually did anything useful.
A comment could help, but it seems quite likely that whatever feature
somebody is adding might not be right next to that comment (and thus not
seen). I think we're mostly relying on institutional knowledge during
review to catch these things. Which is not great, but I'm not sure where
we'd document that knowledge that people would actually see it at the
right time.
> > In other words, it seems like an accident waiting to happen. I'd be more
> > amenable to it if there was some compelling reason for it to be a
> > builtin, but I don't see one listed in the commit message. I see only
> > "let's make it easier to share the code", which AFAICT is equally served
> > by just lib-ifying the code and calling it from the standalone
> > upload-pack.c.
>
> If we have so little control of the common code used by git commands
> that could be invoked by a remote user, I think we're in trouble
> already. I don't think being a builtin vs not makes that
> significantly different, since there are plenty of builtins that can
> be triggered by remote users. Further, if we have so little control
> over the security properties of git.c, what hope do we have of making
> the rest of libgit.a usable in secure code?
I agree that the situation is already pretty dicey. But I also think
that using the git wrapper is more risky than the rest of libgit.a.
There's tons of dangerous code in libgit.a, but upload-pack is smart
enough not to call it. And people modifying upload-pack have a greater
chance of thinking about the security implications, because they know
they're working with upload-pack. Whereas people are likely to touch
git.c without considering upload-pack at all.
The big danger in libgit.a is from modifying some low-level code called
by upload-pack in a way that trusts the on-disk contents more than it
should. My gut says that's less likely, though certainly not impossible
(a likely candidate would perhaps be a ref backend config that opens up
holes; e.g., if you could point a database backend at some random path).
> In other words, having to pay more attention to the git wrapper from a
> security pov actually feels to me like a *good* thing. The git
> wrapper is the entry point to almost all git commands. If it is an
> accident waiting to happen, then anything that calls git commands is
> already an accident waiting to happen. So how can we make it not an
> accident waiting to happen? :)
But I don't think it _is_ an accident waiting to happen for the rest of
the commands. upload-pack is special. The point is that people may touch
git.c thinking they are adding a nice new feature (like pager config, or
aliases, or default options, or whatever). And it _would_ be a nice new
feature for most commands, but not for upload-pack, because its
requirements are different.
So thinking about security in the git wrapper is just a burden for those
other commands.
> > There's not much point for receive-pack. It respects hooks, so any
> > security ship has already sailed there.
>
> Yet there are plenty of cases where people who can push are not
> supposed to have root privilege. I am not worried about hooks
> specifically (although the changes described at [1] might help and I
> still plan to work on those) but I am worried about e.g. commandline
> injection issues. I don't think we can treat receive-pack as out of
> scope.
>
> And to be clear, I don't think you were saying receive-pack *is* out
> of scope. But you seem to be trying to draw some boundary, where I
> only see something fuzzier (e.g. if a bug only applies to
> receive-pack, then that certainly decreases its impact, but it doesn't
> make the impact go away).
Right, I think command-line injection is a separate issue. My concern is
_just_ about "can we be run against on-disk repo contents". And nothing
matters for receive-pack there, because you can already execute
arbitrary code with hooks.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 21:44 ` Jeff King@ 2018-02-22 22:21 ` Jeff King
2018-02-22 22:42 ` Jonathan Nieder0 siblings, 1 reply; 362+ messages in thread
From: Jeff King @ 2018-02-22 22:21 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Thu, Feb 22, 2018 at 04:44:02PM -0500, Jeff King wrote:
> But I don't think it _is_ an accident waiting to happen for the rest of
> the commands. upload-pack is special. The point is that people may touch
> git.c thinking they are adding a nice new feature (like pager config, or
> aliases, or default options, or whatever). And it _would_ be a nice new
> feature for most commands, but not for upload-pack, because its
> requirements are different.
>
> So thinking about security in the git wrapper is just a burden for those
> other commands.
All of that said, I think the current code is quite dangerous already,
and maybe even broken. upload-pack may run sub-commands like rev-list
or pack-objects, which are themselves builtins.
For example:
git init --bare evil.git
git -C evil.git --work-tree=. commit --allow-empty -m foo
git -C evil.git config pager.pack-objects 'echo >&2 oops'
git clone --no-local evil.git victim
That doesn't _quite_ work, because we route pack-objects' stderr into a
pipe, which suppresses the pager. But we don't for rev-list, which we
call when checking reachability. It's a bit tricky to get a client to
trigger those for a vanilla fetch, though. Here's the best I could come
up with:
git init --bare evil.git
git -C evil.git --work-tree=. commit --allow-empty -m one
git -C evil.git config pager.rev-list 'echo >&2 oops'
git init super
(
cd super
# obviously use host:path if you're attacking somebody over ssh
git submodule add ../evil.git evil
git commit -am 'add evil submodule'
)
git -C evil.git config uploadpack.allowReachableSHA1InWant true
git -C evil.git update-ref -d refs/heads/master
git clone --recurse-submodules super victim
I couldn't quite get it to work, but I think it's because I'm doing
something wrong with the submodules. But I also think this attack would
_have_ to be done over ssh, because on a local system the submodule
clone would a hard-link rather than a real fetch.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 22:21 ` Jeff King@ 2018-02-22 22:42 ` Jonathan Nieder
2018-02-22 23:05 ` Jeff King0 siblings, 1 reply; 362+ messages in thread
From: Jonathan Nieder @ 2018-02-22 22:42 UTC (permalink / raw)
To: Jeff King
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
Jeff King wrote:
> All of that said, I think the current code is quite dangerous already,
> and maybe even broken. upload-pack may run sub-commands like rev-list
> or pack-objects, which are themselves builtins.
Sounds like more commands to set the IGNORE_PAGER_CONFIG flag for in
git.c.
Thanks for looking this over thoughtfully.
[...]
> I couldn't quite get it to work, but I think it's because I'm doing
> something wrong with the submodules. But I also think this attack would
> _have_ to be done over ssh, because on a local system the submodule
> clone would a hard-link rather than a real fetch.
What happens if the submodule URL starts with file://?
Thanks,
Jonathan
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 22:42 ` Jonathan Nieder@ 2018-02-22 23:05 ` Jeff King
2018-02-22 23:23 ` Jeff King0 siblings, 1 reply; 362+ messages in thread
From: Jeff King @ 2018-02-22 23:05 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Thu, Feb 22, 2018 at 02:42:35PM -0800, Jonathan Nieder wrote:
> > I couldn't quite get it to work, but I think it's because I'm doing
> > something wrong with the submodules. But I also think this attack would
> > _have_ to be done over ssh, because on a local system the submodule
> > clone would a hard-link rather than a real fetch.
>
> What happens if the submodule URL starts with file://?
Ah, that would do it. Or I guess any follow-up fetch.
I'm still having trouble convincing submodules to fetch _just_ the
desired sha1, though. It always just fetches everything. I know there's
a way that this kicks in (that's why we have things like
allowReachableSHA1InWant), but I'm not sufficiently well-versed in
submodules to know how to trigger it.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread

*Re: [PATCH v3 04/35] upload-pack: convert to a builtin
2018-02-22 23:05 ` Jeff King@ 2018-02-22 23:23 ` Jeff King0 siblings, 0 replies; 362+ messages in thread
From: Jeff King @ 2018-02-22 23:23 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Brandon Williams, Jonathan Tan, git, sbeller, gitster, stolee,
git, pclouds
On Thu, Feb 22, 2018 at 06:05:15PM -0500, Jeff King wrote:
> On Thu, Feb 22, 2018 at 02:42:35PM -0800, Jonathan Nieder wrote:
>
> > > I couldn't quite get it to work, but I think it's because I'm doing
> > > something wrong with the submodules. But I also think this attack would
> > > _have_ to be done over ssh, because on a local system the submodule
> > > clone would a hard-link rather than a real fetch.
> >
> > What happens if the submodule URL starts with file://?
>
> Ah, that would do it. Or I guess any follow-up fetch.
>
> I'm still having trouble convincing submodules to fetch _just_ the
> desired sha1, though. It always just fetches everything. I know there's
> a way that this kicks in (that's why we have things like
> allowReachableSHA1InWant), but I'm not sufficiently well-versed in
> submodules to know how to trigger it.
<facepalm> This won't work anyway. I was right when I said that we don't
redirect stderr for rev-list, but of course it's stdout that determines
the pager behavior. So I don't think you could get rev-list to trigger a
pager here.
I don't think there's currently any vulnerability, but it's more to do
with luck than any amount of carefulness on our part.
-Peff
^permalinkrawreply [flat|nested] 362+ messages in thread