Richard Barnes: Discuss [2014-10-15 20:56]:
As with draft-ietf-oauth-assertions, the requirement for an <Audience> element seems entirely unnecessary. Holding this DISCUSS point pending that discussion and its reflection in this document.
'Assertions that do not identify the Authorization Server as an intended audience MUST be rejected.' -- What does it mean for an assertion to 'identify the Authorization Server'? Does the specified <Audience> need to match the entire URL of the relevant OAuth endpoint? Just the origin? Just the domain? Does the URL need to be canonicalized?

Richard Barnes: Discuss [2014-10-15 21:01]:
As with draft-ietf-oauth-assertions, the requirement for an 'aud' claim seems entirely unnecessary. Holding this DISCUSS point pending that discussion and its reflection in this document.
'Assertions that do not identify the Authorization Server as an intended audience MUST be rejected.' -- What does it mean for an assertion to 'identify the Authorization Server'? Does the specified <Audience> need to match the entire URL of the relevant OAuth endpoint? Just the origin? Just the domain? Does the URL need to be canonicalized?

Stephen Farrell: Discuss [2014-10-16 05:05]:
The security considerations here say: 'Note that [RFC5925] will not help in keeping MPLS labels private -- knowing the labels, one can eavesdrop on EVPN traffic. However, this requires access to the data path within an SP network, which is assumed to be composed of trusted nodes/links.' The last clause there seems to me to be one that reality has trumped, e.g. with Belgacom/GCHQ. I think a direct consequence is that TCP-AO isn't really sufficient here for the P in VPN to be meaningful. Now, I'm not really expecting that you'll all suddenly agree with me about that, so I'm not expecting that you'll immediately want to encrypt all traffic, but I'd like to better understand what security considerations apply here when not all infrastructure nodes can be trusted, as perhaps there's at least a bit more documentation needed to cover that. (We can chat separately about a plan to try fix this longer term, but while I'd be delighted to have that chat, I'm not trying to require it as part of this discuss.)

Pete Resnick: Comment [2014-10-16 00:09]:
I gave the document a *very* quick read. Nothing jumping out at me as either apps related or worrisome. No objection unless someone thinks I should go through this more thoroughly.

Barry Leiba: Discuss [2014-10-15 22:29]:
Just a couple of easy points:
-- Section 5 --
"In general, an Ethernet segment SHOULD have a non-reserved ESI that is unique network wide (i.e., across all EVPN instances on all the PEs)."
Doesn't this SHOULD contradict the MUST in the definition of ESI in Section 3?
-- Section 23 (References) --
RFC2119 needs to be a normative reference, as it's required in order to understand the meaning of the MUST, SHOULD, and MAY key words.

Kathleen Moriarty: Discuss [2014-10-15 17:20]:
The draft is well written and I do support it moving forward.
This discuss will be cleared with the next update, thanks.
I don't see a discussion on privacy and would like to figure out if it needs to mention it or not. Although the draft provides a way to represent certainty and confidence information on geolocation data, wouldn't the location coordinates be sensitive if combined with other information such as event types or people at the locations identified? I think it would be good to mention that this data in and of itself is not privacy sensitive (we can pinpoint where the Sydney Opera house is located), but when combines with other information, it may become privacy sensitive information (a crime took place at the Sydney Opera House and it has not been announced yet and the people involved have not been identified). You may not want news camera on the scene of certain events - rape victim still present.
Alissa suggested adding a reference BCP 160 since it applies whether or not location is certain, which works for me to resolve this question.
Thanks.

Barry Leiba: Discuss [2014-10-15 22:27]:
An easy point, quickly fixed:
At the top of the document, you say that operator precedence is specified by parentheses, and give no other precedence mechanism. In the formulae in Sections 5.1.1.1, 5.1.1.2, 5.2, and elewhere, you seem to be using the standard implicit precedence of multiplication over addition, or perhaps using spacing to indicate precedence. I think you need to either say (up in Section 1.1) that you're doing that, or add more parentheses to the formulae.
It seems to me that it's be best to say more about operator precedence in 1.1, rather than to clutter the formulae throughout the document.

Adrian Farrel: Discuss [2014-10-15 10:26]:
I support the idea of this document. It could provide useful guidance, especially to newcomers to BGP operations. However, I have some issues I would like to see resolved or at least discussed before the document advances.
The Routing Directorate review from Geoff Huston received a somewhat peremptory response form the authors more concerned with the nature and timing of the review than with the technical issues raised. The authors specifically asked for ADs to tell them how to proceed and, since the review came after the end of IETF last call so I am adopting those issues that I consider important as part of this Discuss (although I would be very happy if you addressed them all).
---
Section 5.1 talks about GTSM, but does not discuss what to do when there is more than one IP hop between BGP speakers. It would be perfectly fine to explicitly state that this mechanism can only apply to single-hop BGP sessions such as those between adjacent ASBRs.
Section 5.1. also talks about IPSEC, but as Geoff Huston observed, while the use of IPSEC has been documented as a possible BGP transport there is very little deployment experience and reasons have been suggested why this would expose the router to further forms of denial of service attack because of the workload in decrypting incoming IPSEC packets. Maybe the thing to do is either strike the sentence or add a caveat that further analysis might be needed.
---
Unless I missed it, the document doesn't talk about compromised routers and bad actors (perhaps some slight discussion in the SIDR section?). We normally talk about compromised IGP routers and how they are hard to protect against, but the issues are somewhat different in BGP speakers because of what they can do across the whole Internet, and how the compromise can be in something like a Route Reflector that may be a server rather than a dedicated hardware router. Furthermore, the actions of a bad actor can be intended to do far more than simply break things.
I don't believe this would be a hard topic to address, but it also has knock-on effects on the efficacy of some of the security mechanisms suggested and (maybe) makes SIDR more pressing.
---
Section 6.1.4
"A network SHOULD filter its own prefixes on peerings with all its peers (inbound direction)."
Geoff notes
'This requires a lot more thought, particularly relating to multi-homed networks that do not use a dedicated ASN. One party's leak is another party's form of traffic engineering.'
I don't think this needs a lot of work, just a qualification of the type of peering where this recommendation applies.
---
In Section 11
"In particular do not (generally) remove the no-export community as it is usually announced by your peer for a certain purpose."
As Geoff says, this seems in conflict with the normal processing rules for a No-Export community.
---
And two final Discuss points from me...
I have no objection to the use of RFC 2119 language in a BCP and I think it is OK to pitch this document as a BCP, but I am confused as to the use of 'MUST' in conjunction with the text in Section 2
"Nature of the Internet is such that Autonomous Systems can always agree on exceptions for relevant local needs, and therefore configure rules which may differ from the recommendations provided in this document."
So I think that the document is making recommendations, and that you need to limit yourself to 'SHOULD' although it would be more in keeping to use 'RECOMMENDED'.
Although in section 9 you have 'This section is listing rules that apply to BGP AS-paths' followed by some uses of 'SHOULD'. Perhaps, 'This section lists the RECOMMENDED practices when processing BGP AS-paths'
---
6.1.2.4 makes the apparent statement that RFC 6480 includes BGPsec in its infrastructure. I think it is fine to include BGPsec in this section (or maybe a closely-related section), but you probably shouldn't say it is directly derived from 6480.

Ted Lemon: Discuss [2014-10-16 06:07]:
This has probably already been considered and addressed by the working group, but coming into this as a neophyte it seems like a glaring omission that the security considerations of bearer assertions are not discussed here. Isn't it the case that the use of bearer assertions requires a trust relationship between the client and relying party such that the client can be assured that the relying party will not misuse the assertion to authenticate with some other entity? I realize that this sort of assertion will likely only be used in cases where the assertion only works to authenticate to a specific relying party, but I think this bears mentioning in the security considerations.

Stephen Farrell: Discuss [2014-10-16 04:22]:
Putting one discuss here rather than one on each of the other docs. We can fix that as appropriate after we chat. Where are the MTI signature and mac algs for these specified? If those can be tracked back via the SAML and jose docs that's fine, but I'm not sure if they are.

Richard Barnes: Discuss [2014-10-15 20:47]:
'The assertion MUST contain an Audience that identifies the Authorization Server as the intended audience. Assertions that do not identify the Authorization Server as an intended audience MUST be rejected.'
Could you please identify the threat model within which this 'MUST' is required? This requirement doesn't follow from any of the threats elaborated in Section 8.
The Audience is only necessary if the Issuer wishes to constrain the set of Authorization Servers with which an assertion may be used. So ISTM that this should be 'MAY contain...'

Adrian Farrel: Discuss [2014-10-16 03:30]:
I welcome this document and think it is a useful addition to the canon. However, John Scudder did a Routing Directorate review during the IETF last call period and emailed his comments to the authors and to the GROW mailing list. I have seen no response to this directly or on the GROW list.
Therefore, from a process point of view, I adopt all of John's comments as a Discuss even though many of the points are small and would normally be just Comments.
- Throughout the document, various terms are used to describe what RFC 4271 calls a 'route'. The definition given in RFC 4271 is:
"Route: A unit of information that pairs a set of destinations with the attributes of a path to those destinations. The set of destinations are systems whose IP addresses are contained in one IP address prefix carried in the Network Layer Reachability Information (NLRI) field of an UPDATE message. The path is the information reported in the path attributes field of the same UPDATE message."
That is, one NLRI plus its path attributes, as carried in an UPDATE, is a 'route'. I would suggest adopting this term, or 'BGP route' if you prefer, instead of terms such as 'NLRI UPDATE message', 'NLRI message', 'prefix UPDATE message', and even just plain 'NLRI' and 'message'. Also some, but not all, of the uses of 'prefix'. I think doing so will make the document clearer, more readable, and more technically accurate. A simple search for the terms I've called out should show most of them so I won't enumerate them here unless you ask me to (feel free, if you want).
- Reference [RS-ARCH] is a dead link. I found a live copy at http://www.cs.usc.edu/assets/003/83191.pdf. It might be worth checking with the authors of RS-ARCH to ask what a good archival reference is.
- S. 4.2 talks about scaling. I'm trying to make sense of the analysis:
"Regardless of any Loc-RIB optimization technique is implemented, the route server's control plane bandwidth requirements will scale according to O(P * N), where P is the total number of unique paths received by the route server and N is the total number of route server clients."
So far so good. (Except nit: there seems to be a word missing, such as 'whether' as in 'Regardless of whether any Loc-RIB...')
"In the case where P_avg (the arithmetic mean number of unique paths received per route server client) remains roughly constant even as the number of connected clients increases, this relationship can be rewritten as O((P_avg * N) * N) or O(N^2)."
I don't see where the second factor of N comes from. You're basically expanding the P in the first expression as P_avg * N -- but why? I think this would only apply if add-path all-paths was chosen as the path hiding mitigation strategy -- but this is not touched on in route-server-operations, only in ix-bgp-route-server, and besides that the beginning of the paragraph implies you're analyzing the multiple Loc-RIB strategy, so I don't guess all-path is what you were thinking of. If you're not doing all-path, the O(N^2) analysis is wrong AFAICT. To see this, consider that the inbound routes require O(P_avg * N) which is just O(N), but the number of routes you're going to advertise is bounded by the size of the Internet routing table, which is a constant for purposes of this analysis, so also O(N). In and out are summed, not multiplied, so the whole thing works out to be O(N), not O(N^2).
So I think this needs to either be corrected, or the assumptions need to be better explained. Moving on:
"This quadratic upper bound on the network traffic requirements indicates that the route server model will not scale to arbitrarily large sizes."
If you continue to think this sentence is warranted, I think it should be better quantified. Of course nothing can scale to *arbitrarily* large sizes, but that still leaves a lot to the imagination. I would think it would be beneficial for an IX operator reading this document to be able to have some idea of how practical the limitation is. Since the analysis in question is looking at control traffic bandwidth consumption, it wouldn't be too onerous to throw some simple assumptions up against it -- for example, 'if we suppose a RS receives on average 100,000 routes from each client with a rate of change of 10 routes/second, sends on average 1,000,000 routes to each client with a rate of change of 100 routes/second, and that each route consumes on average 50 bytes in a BGP UPDATE message, simple arithmetic shows that a GigE connection to that RS will be fully saturated by the time the number of clients reaches 25,000.' (Which does not seem like a very practical limitation, the RS will hit a CPU or memory bottleneck first.)
Anyway, maybe you will decide on reconsideration of the big-O analysis that this bit is not needed at all, which would be OK with me.
- S 4.2.2.1,
"If the route server operator has prior knowledge of interconnection relationships between route server clients, then the operator may configure separate Loc-RIBs only for route server clients with unique outbound routing policies."
It wasn't obvious to me what 'outbound' applies to -- the client? The RS? -- and for that matter why an inbound policy (on the RS) might not apply. Possibly this could be remedied by simply dropping the adjective 'outbound'.
- S. 4.2.1.2,
"destination splitting would require significant co-ordination between the route server operator and each route server client"
It's not clear to me why it would 'require significant co-ordination', depending on what resource you're trying to conserve. Two examples of how you could avoid coordination while still getting benefit: You could have clients send all their routes to all the RSes, but have RSes filter out the prefixes they don't care about. This gives the RS most of the CPU benefit it would have gotten had the client done the filtering (prefix filtering is cheap), almost all the memory benefit (the filtered routes need not be retained in the Adj-RIB-In), and around half the control traffic bandwidth benefit. The client incurs cost to send duplicate routes that are going to be discarded by the RS, but the client is presumably not the bottleneck resource. Or better still, the RS could use ORF towards the clients to control what routes the clients will send.
- S. 4.6.1,
OLD: Prefixes sent to the route server are tagged with specific [RFC1997] or [RFC4360] BGP community attributes
I don't think the naked references scan well as adjectives in this context. I suggest
NEW: Prefixes sent to the route server are tagged with specific standard [RFC1997] or extended [RFC4360] BGP community attributes
- Also in S. 4.6.1,
OLD: As both standard and extended BGP communities values are restricted to 6 octets
Actually standard communities are restricted to less than that. Perhaps reword as
NEW: As both standard and extended BGP communities values are restricted to 6 octets or fewer
- Also in S. 4.6.1,
"route server operator should take care to ensure that the predefined BGP community values mechanism used on their route server is compatible with [RFC4893] 4-octet autonomous system numbers."
I suspect an RS operator reading this might be left scratching his or her head and asking 'what does it mean for me to be compatible with RFC4893 in this context'? It would be kind to offer them some guidance, since after all this is a guidance document.
- S. 4.7: Where you say 'non-commutative' I think you mean 'non-transitive'.
- S. 4.7:
"Problems of this form can be dealt with using [RFC5881] bidirectional forwarding detection."
It's not clear to me how certain non-transitive forwarding failures can be dealt with using BFD. To take an example, suppose clients A, B and C peer with RS. The IX fabric has a failure such that A and B can both reach RS, but not each other. C has connectivity to everyone. Prefix X is advertised to RS by both B and C. For whatever reason, RS selects X via B to advertise to A. Even if A runs BFD towards B, at best A can determine that the route from RS can't be used. A isn't able to fail over to C's route as it would in the full-mesh case, since it's not aware of it. Depending on A's other connectivity, this may result in sub-optimal routing towards X, or complete loss of connectivity to X.
It's beyond the scope of the draft to solve this problem, but the text could be made more accurate. A minimal fix would be
"Problems of this form can be partially mitigated using [RFC5881] bidirectional forwarding detection."
although you might want to go on a bit longer to explain what problems can't be mitigated.
- S. 4.8:
"This problem is not specific to route servers and it can also be implemented using bilateral peering sessions. However, the potential damage is amplified by route servers because a single BGP session can be used to affect many networks simultaneously."
This is true, but there is a more severe way RSes aggravate the problem: In a full mesh, a router can (and usually does) directly enforce a 'no third-party next hops' policy against its peers. An RS peer by definition cannot enforce this policy against the RS, so the RS is the only place it can be enforced.
- S. 4.8:
"Route server operators SHOULD check that the BGP NEXT_HOP attribute for NLRIs received from a route server client matches the interface address of the client. If the route server receives an NLRI where these addresses are different"
so far so good (modulo my first comment about the use of 'NLRI', of course), but:
"and where the announcing route server client is in a different autonomous system to the route server client which uses the next hop address,"
Is the RS sincerely expected to enforce the above? I suppose it could be implemented automatically although imperfectly, by noticing that multiple clients are in the same neighbor AS and noticing when they use each other as third-party next hops, but AFAIK people generally don't try to figure this out, they just do what you've said in the preceding sentence -- make sure the NH matches the interface address. If you really do propose that the RS should allow third-party next hops but only from clients in a common AS, I think you should talk about it specifically and in more detail. If you didn't really mean that, then I suggest you drop the clause.
- S. 5:
"On route server installations which do not employ path hiding mitigation techniques, the path hiding problem outlined in section Section 4.1 can be used in certain circumstances to proactively block third party prefix announcements from other route server clients."
I don't understand what this means. Specifically, I don't know what it means to 'proactively block third party prefix announcements' or for that matter, even what you mean by 'third party prefix announcements' in this context. (As a term of art, I normally understand 'third party announcement' in a BGP context to mean announcing a third-party next hop as you discuss in S. 4.8). I also don't know what the 'certain circumstances' are, quite likely these should be given at least a little color if not entirely spelled out.
Also, a nit -- the xref expansion has put 'section section' into your text.
- S. 7:
"BIRD, OpenBGPD and Quagga, whose open source BGP implementations include route server capabilities"
Great, cool, but:
"which are compliant with this document."
I'm not sure what it actually means to be 'compliant' with a document that 'describes operational considerations'. Perhaps just drop the phrase?

Kathleen Moriarty: Discuss [2014-10-13 14:19]:
Thank you for your work on this draft! It looks great, but I do have two items I'd like to discuss and see if we can add text to address these concerns/attacks before switching to a yes.
1. I think it would be very helpful to include what techniques are used by forensic tools that enable access to decrypt TLS sessions and how to respond/prevent that access. I see you have certificate attacks listed in 2.7, which is what the common forensic tools leverage. However, these tools also require access to the private key, and it would be helpful to mention the importance of protecting the private key, preventing exportability, etc..
In 2.7 I thin it would be helpful to explicitly state that commonly used forensic tools such as wireshark require access to the private key as well as use of RSA. Another option might be to add the recommendation to protect the private key in 2.13. If you can't export the key (from an implementation perspective), that could go a long way to helping to reduce this method of exposure.
I've included a few links for additional information on the list of tools and explicit details of the attack used in case this is helpful.
Forensic tools that rely on a MiTM attack to decrypt TLS and DTLS session: http://forensicswiki.org/wiki/SSL_forensics
Wireshark requires you have the key: http://support.citrix.com/article/CTX116557
Mitigations are to protect the private key and to not use RSA:
http://wirewatcher.wordpress.com/2010/07/20/decrypting-ssl-traffic-with-wireshark-and-ways-to-prevent-it/
http://wiki.wireshark.org/DTLS?highlight=%28tls%29
If you would like text suggestions, let me know and I'll help.
2. I realize this draft covers explicit attacks against TLS, however since pervasive monitoring is considered an attack, it could be helpful for this draft also cover techniques used by middle boxes to intercept TLS streams (proxy firewalls, load balancers, etc.). Although these are more of 'attacks' on the user, than of TLS, it could be a short addition to have this documented.
The TLS session is intercepted, or in the case of a load balancer it might be terminated, with a second TLS session initiated to the destination allowing traffic to pass in the clear on the middlebox. The user is alerted and warned to accept an untrusted certificate in this process, and many do as a result of corporate restrictions (they have no choice if they want to go to that site).
Perhaps implementation recommendations could assist here to improve the warnings to the user, letting them know their traffic may be passing in the clear and they may not want to continue with their session. Some may chose to avoid certain transactions from the work place as a result.
If you have already discussed these items and decided they were out of scope, please let me know. I support this work and just wanted to make sure we covered all bases or put some out of scope. Thank you.

Jari: two steps: first discuss feedback; quite active discussion/clarification; was it too short-term? procedural concerns; some support/some concerns; no challenge to IESG organizing its work

Pete: a lot of tangents; don't think anyone opposed doing anything like this, but questioned whether not replacing APPS is right tactic

Adrian: when we met in Newark, we knew no candidates for APPS, some rug-pulling for folks who convinced their mgmt

Barry: I disagree that we misled anyone about APPS position

Adrian: I disagree that correlation is causation

Richard: I'm not super-concered about making folks feel bad

Brian: I agree we're within our rights; I don't share Adrian's level of concern; flexibility will help us find the right way to reorganize; likely to add back 15th AD with different skill-set

Richard: I've only skimmed the IETF list: I think this is an experiment worth doing

Kathleen: I think it's OK to move forward; rather not put someone in a poorly-defined position

Alia: sensitive to concerns problems of redefining a position for incumbent; could be quite useful to have second APP AD's input, but worried that it might get in the way

Martin: OK to go forward, we've been stuck with some variation of the problem we're addressing for many years without finishing any changes

Spencer: I agree with Martin, it's better to take a shot at doing something, because the problem isn't going to go away. We often talk as if there's a problem with a specific area, but usually realize after a while that we could have this problem in any area - no area is immune. Now that we've started to do something, it's important to finish what we've started

Jari: we can't guarantee there will be interest in work we describe; we also have to do other things; but silly if we don't try to match structure to work

Ted: I agree with Kathleen; holding off is better

Joel: reorganization of work is a coordination exercise; creation/removal of areas is necessary and perislous

Russ: if you wait a year, it's going to be worse, not better

Pete: two things going on here: we want to be careful not to say "We can do what we want" -- there's nothing which forbids this: we should act in consultation, but consensus isn't required; we don't want to appear to say who the NomCom picks; text to NomCom saying "it's up to you, but we don't think it's necessary

Jari: my impression is we're part "mixed"

Barry: I'm fine; message that "IESG suggests not filling: if they fill it, make sure candidates are aware"

Jari: any other thoughts

Adrian: whatever message is sent, remember the pain Russ went through with a perception he was telling the NomCom to do

Barry: this must come from me as liaison

Jari: we must also say it openly to the community

Pete: the form of the message is correct as is

Jari: does anyone have major concerns?

Benoit: what language exactly?

Jari: "we think this is best, but NomCom must decide"; Barry do you have time to look it over and send exact text to IESG list

Barry: I shall do so before this call is even over

Jari: minutes to show IESG approved language "it would be easier if you didn't fill this

7. Agenda Working Group News

(none)

Amy: please be aware that Daylight Savings time does _not_ end before our next telechat October 30

Amy: Alexa has been in contact with WebEx about storage issue for storing recordings; you can try moving the recordings off the WebEx server: it might help

Alia: more interims coming up

Amy: strange we went suddenly from 50% space free to over-full; helpful for folks that "own" recordings to remove them... we're working on it

1250 EDT Adjourned

Appendix: Snapshot of discusses/comments

2.1/2.2 - This paragraph shows why I don't like haphazard use of 2119. The
first "MUST be" is obviously silly and should simply be "is". But the second
one buries what *might* be a proper and important use of MUST (you MUST NOT try
to stick in two SAML Assertions) with a simple definitional one. (And that
assumes that it's even plausible to try to use more than one SAML Assertion. If
you simply can't, it's just s/MUST contain/contains.) The base64url encoding
MUST is fine, because you don't want people sticking in raw XML, but the SHOULD
NOTs for line wrapping and pad I am curious about: Isn't a parser going to have
to check for line wrapping and pad anyway and undo it (because it's not a MUST
NOT), and therefore this SHOULD NOT really isn't about interoperability so much
as neatness (in which case they SHOULD NOTs are not appropriate)?
3 - Subpoint 2: Just for clarification, I like the non-passive sentence better:
"The Authorization Server MUST reject any assertion that does not contain its
own identity as the intended audience."
Subpoint 5:
OLD
The <SubjectConfirmation> element MUST contain a
<SubjectConfirmationData> element, unless the Assertion has a
suitable NotOnOrAfter attribute on the <Conditions> element, in
which case the <SubjectConfirmationData> element MAY be omitted.
That one's sure to get misquoted somewhere and confuse someone. Instead:
NEW
If the Assertion does not have a suitable NonOnOrAfter attribute
on the <Conditions> element, the <SubjectConfirmation> element
MUST contain a <SubjectConfirmationData> element.
Subpoint 6:
OLD
The authorization server MUST verify that the NotOnOrAfter
instant has not passed, subject to allowable clock skew between
systems. An invalid NotOnOrAfter instant on the <Conditions>
element invalidates the entire Assertion. An invalid
NotOnOrAfter instant on a <SubjectConfirmationData> element only
invalidates the individual <SubjectConfirmation>.
NEW
The authorization server MUST reject the entire Assertion if
the NotOnOrAfter instant on the <Conditions> element has passed
(subject to allowable clock skew between systems). The
authorization server MUST reject the <SubjectConfirmation> (but
MAY still use the rest of the Assertion) if the NotOnOrAfter
instant on the <SubjectConfirmationData> has passed (subject to
allowable clock skew).
Subpoint 7: Are you sure those SHOULDs and SHOULD NOTs are not conflicting? Can
you not have an authenticated subject with an autonomously acting client?
Subpoint 9: As I asked in the -assertions document, is this really a
requirement?
Subpoint 11: Again, it would be better to put the MUST on the action (e.g.,
"MUST reject") to make it clear who is doing what.
3.1/3.2 - s/MUST construct/constructs
4 - s/Though non-normative//
9 - Seems like OASIS.saml-deleg-cs and OASIS.saml-sec-consider-2.0-os are
Normative, not Informative.

- intro para2: might be nice (no more) to add some refs to
other protocols that use SAML.
- 2.2: What are "padding bits" in 4648? I don't recall such.
(But may be misremembering.)
- section 3, list item 2: This doesn't quite say that the
token endpoint URL MUST (in the absence of another profile) be
in an Audience element. Why not? The text seems to me to allow
for the AS to map the token endpoint URL to any value in an
Audience element that the AS finds ok. I suspect that might be
unwise, but it at least needs to be clear. Is that the text
being ambiguous or me being paranoid/wrong? Same point seems
to apply elsewhere too:
= in item 3.A where it says "typically identifies" but
does not say how.
= in item 5 "or an acceptable alias"
- section 3, item 7: How might an AS know that "the Assertion
was issued with the intention that the client act autonomously
on behalf of the subject"?

As with draft-ietf-oauth-assertions, the requirement for an <Audience> element
seems entirely unnecessary. Holding this DISCUSS point pending that discussion
and its reflection in this document.
"Assertions that do not identify the Authorization Server as an intended
audience MUST be rejected." -- What does it mean for an assertion to "identify
the Authorization Server"? Does the specified <Audience> need to match the
entire URL of the relevant OAuth endpoint? Just the origin? Just the domain?
Does the URL need to be canonicalized?

I cleared my DISCUSS on the basis that RFC 6755 will be moved to an informative
reference in response to this process issue: IDnits complains of a normative
reference to Informational document RFC 6755, which was not noted in the Last
Call announcement.
Editorial Nits:
S2.2: The paragraph before the actual example uses terminology inconsistent
with RFC 6749:
s/authorization code grant/authorization grant/

I'm not going to repeat stuff that is identical to
draft-ietf-oauth-saml2-bearer (and I did find that using
<https://www.ietf.org/rfcdiff?url1=draft-ietf-oauth-saml2-bearer-21&difftype=--html&submit=Go%21&url2=draft-ietf-oauth-jwt-bearer-10> was very helpful). Please refer to my comments on that document.

As with draft-ietf-oauth-assertions, the requirement for an "aud" claim
seems entirely unnecessary. Holding this DISCUSS point pending that discussion
and its reflection in this document.
"Assertions that do not identify the Authorization Server as an intended
audience MUST be rejected." -- What does it mean for an assertion to "identify
the Authorization Server"? Does the specified <Audience> need to match the
entire URL of the relevant OAuth endpoint? Just the origin? Just the domain?
Does the URL need to be canonicalized?

The security considerations here say: "Note that [RFC5925]
will not help in keeping MPLS labels private -- knowing the
labels, one can eavesdrop on EVPN traffic. However, this
requires access to the data path within an SP network, which
is assumed to be composed of trusted nodes/links." The last
clause there seems to me to be one that reality has trumped,
e.g. with Belgacom/GCHQ. I think a direct consequence is that
TCP-AO isn't really sufficient here for the P in VPN to be
meaningful. Now, I'm not really expecting that you'll all
suddenly agree with me about that, so I'm not expecting that
you'll immediately want to encrypt all traffic, but I'd like
to better understand what security considerations apply here
when not all infrastructure nodes can be trusted, as perhaps
there's at least a bit more documentation needed to cover
that. (We can chat separately about a plan to try fix this
longer term, but while I'd be delighted to have that chat, I'm
not trying to require it as part of this discuss.)

Just a couple of easy points:
-- Section 5 --
In general, an Ethernet segment SHOULD have a non-reserved ESI that
is unique network wide (i.e., across all EVPN instances on all the
PEs).
Doesn't this SHOULD contradict the MUST in the definition of ESI in Section 3?
-- Section 23 (References) --
RFC2119 needs to be a normative reference, as it's required in order to
understand the meaning of the MUST, SHOULD, and MAY key words.

-- Section 1 --
The procedures described here are intended to meet the
requirements specified in [RFC7209].
But do they? By the time we get to IESG approval, I should hope that it's more
than an intent, but an assertion. Please assert it.
-- Section 3 --
Ethernet Segment Identifier (ESI): If a CE is multi-homed to two or
more PEs, the set of Ethernet links that attaches the CE to the PEs
is an 'Ethernet segment'. Ethernet segments MUST have a unique non-
zero identifier, the 'Ethernet Segment Identifier'.
Aren't you defining two things here: Ethernet Segment and Ethernet Segment
Identifier? Shouldn't you make separate definitions for each?
Presuming that the segments don't all share the same ESI, the second sentence
should say:
NEW
Each Ethernet segment MUST have a unique non-
zero identifier, the 'Ethernet Segment Identifier'.
END
Single-Active Redundancy Mode: When only a single PE, among a group
of PEs attached to an Ethernet segment, is allowed to forward traffic
to/from that Ethernet Segment
I think the intent here is to refer to only one PE among *all* of the PEs
attached... correct? Or do I have that wrong? As written, it doesn't say that.
-- Section 4 --
An EVPN instance comprises
CEs that are connected to PEs that form the edge of the MPLS
infrastructure.
THANK YOU for a rare correct use of "comprises"!
-- Section 11.1 --
The Originating Router's IP address MUST be set to an IP address of
the PE. This address SHOULD be common for all the EVIs on the PE
Why the SHOULD? What are the effects to the protocol if the address is not
common? What are the considerations that have to be made in order to decide
whether it's acceptable not to use a common address?

Thanks for producing this draft. It’s important.
I did have one question.
In this text, in 3.5.1. Default algorithm:
TAU=4*T is a reasonable compromise between burst size and throttled
rate adaptation at low offered rates.
will this always be true for SIP (so, “is”), or is there an appropriate
qualifier that could be included?

3.3 - Isn't the second paragraph completely redundant with the first? Why not
remove it?
3.4 - What is the first sentence of the second paragraph trying to tell the
implementer? That it can't lower the rate based on something other than
overload state? That it has to do it "periodically", whatever that means? I
don't get it.
In other words, when multiple clients are being controlled by an
overloaded server, at any given time some clients may receive
requests at a rate below their target (maximum) SIP request rate
while others above that target rate.
The *server* receives the request, not the client, right? I don't understand
this sentence.
The second to last paragraph also seems redundant. Can it be removed?
3.5.1/3.5.2/3.5.3 - The language of "admitted/rejected" had me confused for a
bit because it's talking about the client, which I think of as
"sending/not-sending" requests; the server is the one doing the admitting
(accepting) vs. rejecting. If SIP folks are used to this language, I guess it's
fine, but it did take a reset when I first read it.

I remain completely spooked by the Geopriv work. I understand that I
am "in the rough" with my views and I understand that there are
implementations, etc., etc. But I still think that the privay issues
of Geopriv remain poorly addressed.
On that basis, I will not block this work, but I will also not support
it.
---
Abstract
The key concepts of uncertainty and confidence as they pertain to
location information are defined. Methods for the manipulation of
location estimates that include uncertainty information are outlined.
Are those general statements, or are they intended to refer to this
document?

2 - This section (and some of the longer explanations in other sections) made
me curious who the target audience for this document is. I'm no stats guy, but
I found the information in this section pretty straightforward, and thought
that a simple pointer to a reference or just a list of definitions would
probably have been enough. This document does seem to go on at length about
some pretty basic topics. But maybe I'm not the average reader.
3.1 - "infinitesimally larger"?
4.1 - I'm not clear on the treatment of a confidence of "unknown". How does
this affect implementations (as against a missing confidence)?

- I agree with Kathleen's point that a discussion of privacy
would be good. Perhaps if you could cover how
privacy-(un)friendliness might vary with uncertainty and
confidence that'd be good. Presumably privacy goes "up" as
uncertainty increases and "down" as confidence increases, at
least in some sense? Or if not, explaining why would be
good. I'd say a sentence or two in the security considerations
might be enough for that, perhaps with a warning that
its easy to go wrong when looking for "more" privacy.
- 3.1: This section just wasn't very clear to me. Could that
just be safely deleted? (Or the last para at least.)
- section 5, 1st bullet - does this really belong here? Its
fine to have it here, but I wondered if it'd really be
better somewhere else. (Not suggesting you re-open something
else but just wondered.)
- p15: ECEF is used without expansion
- 5.5: "In the absence of specific recommendations, this
document suggests that the probability be greater than 50%
before a decision is made. " That's not very clear to me. I
think you just mean that the default is to say yes, its in
the area of interest if the probability of that is >50%.

The draft is well written and I do support it moving forward.
This discuss will be cleared with the next update, thanks.
I don't see a discussion on privacy and would like to figure out if it needs to
mention it or not. Although the draft provides a way to represent certainty
and confidence information on geolocation data, wouldn't the location
coordinates be sensitive if combined with other information such as event types
or people at the locations identified? I think it would be good to mention
that this data in and of itself is not privacy sensitive (we can pinpoint where
the Sydney Opera house is located), but when combines with other information,
it may become privacy sensitive information (a crime took place at the Sydney
Opera House and it has not been announced yet and the people involved have not
been identified). You may not want news camera on the scene of certain events
- rape victim still present.
Alissa suggested adding a reference BCP 160 since it applies whether or not
location is certain, which works for me to resolve this question.
Thanks.

An easy point, quickly fixed:
At the top of the document, you say that operator precedence is specified by
parentheses, and give no other precedence mechanism. In the formulae in
Sections 5.1.1.1, 5.1.1.2, 5.2, and elewhere, you seem to be using the standard
implicit precedence of multiplication over addition, or perhaps using spacing
to indicate precedence. I think you need to either say (up in Section 1.1)
that you're doing that, or add more parentheses to the formulae.
It seems to me that it's be best to say more about operator precedence in 1.1,
rather than to clutter the formulae throughout the document.

This document is really well written, and was interesting to read; thanks.
I've only a small quibble in Section 4.2:
Location generators SHOULD attempt to ensure that confidence is equal
in each dimension when generating location information. This
restriction, while not always practical, allows for more accurate
scaling, if scaling is necessary.
Thanks for that: this is how "SHOULD" ought always be specified. I might
remember this to use as an example.
A confidence element MUST be included with all location information
that includes uncertainty (that is, all forms other than a point). A
special "unknown" MAY be used if confidence is not known.
Here, on the other hand, I don't see how the "MAY" makes sense. You MUST
include confidence, even when you don't know what to inlude. So if you don't
know... what *else* can you use but "unknown"? So I think it's not a "MAY be
used", but an "is used".

I support the idea of this document. It could provide useful guidance,
especially to newcomers to BGP operations. However, I have some issues
I would like to see resolved or at least discussed before the document
advances.
The Routing Directorate review from Geoff Huston received a somewhat
peremptory response form the authors more concerned with the nature and
timing of the review than with the technical issues raised. The authors
specifically asked for ADs to tell them how to proceed and, since the
review came after the end of IETF last call so I am adopting those
issues that I consider important as part of this Discuss (although I
would be very happy if you addressed them all).
---
Section 5.1 talks about GTSM, but does not discuss what to do when there
is more than one IP hop between BGP speakers. It would be perfectly fine
to explicitly state that this mechanism can only apply to single-hop BGP
sessions such as those between adjacent ASBRs.
Section 5.1. also talks about IPSEC, but as Geoff Huston observed, while
the use of IPSEC has been documented as a possible BGP transport there
is very little deployment experience and reasons have been suggested why
this would expose the router to further forms of denial of service
attack because of the workload in decrypting incoming IPSEC packets.
Maybe the thing to do is either strike the sentence or add a caveat that
further analysis might be needed.
---
Unless I missed it, the document doesn't talk about compromised routers
and bad actors (perhaps some slight discussion in the SIDR section?).
We normally talk about compromised IGP routers and how they are hard to
protect against, but the issues are somewhat different in BGP speakers
because of what they can do across the whole Internet, and how the
compromise can be in something like a Route Reflector that may be a
server rather than a dedicated hardware router. Furthermore, the actions
of a bad actor can be intended to do far more than simply break things.
I don't believe this would be a hard topic to address, but it also has
knock-on effects on the efficacy of some of the security mechanisms
suggested and (maybe) makes SIDR more pressing.
---
Section 6.1.4
A network SHOULD filter its own prefixes on peerings with all its
peers (inbound direction).
Geoff notes
This requires a lot more
thought, particularly relating to multi-homed
networks that do not use a dedicated ASN. One
party's leak is another party's form of traffic
engineering.
I don't think this needs a lot of work, just a qualification of the type
of peering where this recommendation applies.
---
In Section 11
In particular do not
(generally) remove the no-export community as it is usually
announced by your peer for a certain purpose.
As Geoff says, this seems in conflict with the normal processing rules
for a No-Export community.
---
And two final Discuss points from me...
I have no objection to the use of RFC 2119 language in a BCP and I think
it is OK to pitch this document as a BCP, but I am confused as to the
use of "MUST" in conjunction with the text in Section 2
Nature of the Internet is such that Autonomous Systems
can always agree on exceptions for relevant local needs, and
therefore configure rules which may differ from the recommendations
provided in this document.
So I think that the document is making recommendations, and that you
need to limit yourself to "SHOULD" although it would be more in keeping
to use "RECOMMENDED".
Although in section 9 you have "This section is listing rules that apply
to BGP AS-paths" followed by some uses of "SHOULD". Perhaps, "This section
lists the RECOMMENDED practices when processing BGP AS-paths"
---
6.1.2.4 makes the apparent statement that RFC 6480 includes BGPsec in
its infrastructure. I think it is fine to include BGPsec in this section
(or maybe a closely-related section), but you probably shouldn't say it
is directly derived from 6480.

In addition to my Discuss, I have a number of Comments that I think the
authors should look to before publication.
---
idnits shows a significant number of unused references and an
obsolete reference. This may be intentional as noted in the shepherd
write-up, but that doesn't get the authors off the hook. They need to
write text that points to the references, even if it is as simple as
"Additional background information can be found in [a], [b], ..."
---
No need to expand "BGP" as it is already listed at
http://www.rfc-editor.org/rfc-style-guide/abbrev.expansion.txt
---
Rather than stating intentions be definitive...
OLD
This document
intends to both summarize common existing rules and help network
administrators apply coherent BGP policies.
NEW
This document
summarizes common existing rules and helps network
administrators apply coherent BGP policies.
END
---
In Section 2
If this is perfectly acceptable, one
should note that every configured exception has an impact on the
complete BGP security policy and requires special attention before
implementation.
The English here is confusing. I suppose you "If an agreement is
made between two ASes to apply an exception..." Even then, we should
note Geoff Huston's comment that...
The
correct statement [is] that every BGP peer session
has an impact on the inter-domain routing
environment, and that all BGP session
configurations should be managed with care
and attention.
That is a supportive edit to your text, I think.
---
Section 4
The BGP router needs to be protected from stray packets.
This is an odd way to put it. Do you mean in general? Probably not
because the router's job is to, erm, route. So I think you mean a bit
more...
The "packets" are probably "BGP packets". And "stray" sounds like
"randomly adrift in the sea of the Internet" but you probably mean
something far more specific.
The final paragraph of this section gives some clues, so I suggest
expanding the first sentence into a paragraph that explains the meaning
and then the rest of the section can focus on BGP as the text currently
does.
Furthermore, Geoff Huston's comment is valid:
At present an incoming
"stray" packet addresses to port 179 on the
local BGP speaker would be discarded by the
TCP control process on the BGP speaking
module as there is no active session matching
the TCP 5-tuple of the incoming packet. The
need to offload this discard function to an ACL
is not motivated here, and the reviewer is left
wondering why this is stated as a “need”. The
security risk is incoming packets that use TCP
with the same TCP 5-tuple as an active session
is left to the next session.
Additionally...
This
protection should be achieved by an access control list (ACL) which
would discard all packets directed to TCP port 179 on the local
device and sourced from an address not known or permitted to become a
BGP neighbor
...Why is this a lowercase "should"?
There are a number of similar case issues surrounding advisory language.
And lastly in this section, you should try to define "rate limit" as
Geoff commented...
in particular “rate limiting” is not defined. If
what is meant here is conventional TCP window
control where, when the receiving BGP process
cannot process the incoming data at the same
rate as the sender in sending data then the
conventional TCP response is advertise a
window size of 0 and only reopen the
advertised window once the receiving BGP
process has processed additional data and
opened space in the receive window buffer.
From this respect, given that TCP is a rate
controlled protocol in the first place This
paragraph
---
Section 5.1
The drawback of TCP session protection is additional configuration
and management overhead for authentication information (ex: MD5
password) maintenance. Protection of TCP sessions used by BGP is
thus RECOMMENDED when peerings are established over shared networks
where spoofing can be done (like IXPs).
There does not appear to be a connection between these two sentences and
the use of "thus" is confusing.
---
6.1.1.1
Only prefixes with value
"False" in column "Global" MUST be discarded on Internet BGP
peerings.
The use of "Only" is confusing me. I think you have a MUST here, as...
On Internet BGP peerings prefixes with value "False" in column
"Global" MUST be discarded.
Do you also have a MUST NOT?
Other prefixes MUST NOT be discarded.
similar text in 6.1.1.2
---
6.1.1.2
At the time of the writing of this document, the list of IPv6
prefixes that MUST NOT cross network boundaries can be simplified as
IANA allocates at the time being prefixes to RIR's only in 2000::/3
prefix [35].
This "MUST NOT" is different from guidance to operators, isn't it? It
is a statement of fact, but not a specific direction to the operator.
---
6.1.2
s/One SHOULD probably NOT/One probably SHOULD NOT/
---
6.1.2.4 has "INVALID". Not a 2119 word.
---
s/internet/Internet/
---
6.2
It is RECOMMENDED that each autonomous system configures
rules for advertised and received routes at all its borders as this
will protect the network and its peer even in case of
misconfiguration.
I *think* you mean "the same rule at each of its border routers".
This is not what you have said (you allow a different rule, so long as
*some* rule exists).
---
Barry has already observed instance of stating what the authors think
(e.g., section 7, "Authors of this document propose.."). You need to
tighten this up as it is an IETF consensus BCP, not just a statement of
your opinions.
---
Section 8
Following rules are generally RECOMMENDED:
Recommended or not?
---
Section 9 has two bullets on private AS numbers where the reference is
to "customers". This possibly stretches a point, and Geoff's suggestion
to include "BGP peers that are party to the agreement" seems far more
sensible.
---
Geoff Huston also provided a raft of minor editorial comments that would
significantly improve the readability of the document.

- 5.1, last para: sorry but I don't see how that conclusion
follows from what's stated. Don't you need to assume that
all hosts that can talk to iBGP speakers are trusted as well?
- 6.1.1.2, 2nd para: Is that wise? Why is the simplified
current list so beneficial that its worth risking someone
hard codes that?
- 6.1.2.4 - there was a recent ANRP prize winner who had a
paper on some downsides of partial deployments of RPKI,
wouldn't it be good to reference that? And are there
no conclusions to be drawn from that? (Sorry in a rush,
can find ref later for ya if needed.)

- Section 4, paragraph 3:
In addition to strict filtering, rate-limiting MAY be configured for
accepted BGP traffic. This protects the BGP router control plane in
case the amount of BGP traffic overcomes platform capabilities.
You use MAY, but the paragraph 1 and 2 use "should". This is not consistent
- Section 5.2 BGP TTL security (GTSM)
OLD:
BGP sessions can be made harder to spoof with the Generalized TTL
Security Mechanisms (aka TTL security)
NEW:
BGP sessions can be made harder to spoof with the Generalized TTL
Security Mechanisms (GTSM, aka TTL security)
-
You SHOULD block spoofed packets (packets with a source IP address
belonging to your IP address space) at all edges
versus
Network administrators
SHOULD implement TTL security on directly connected BGP peerings.
Be consistent between you and network administrators.
Same remark for section 9 btw.
- Section 6.1.2.1
Therefore there is
no reason why one would keep checking prefixes are in the IANA
allocated IPv4 address space [38].
Missing "that" after checking?
-
To
partially mitigate this risk, administrators would need to make sure
BGP advertisements correspond to information located in the existing
registries.
SHOULD?
That's a generic comment on this draft.
I'm not sure the RFC 2119 keywords are used consistently.
For example, I don't understand if SIDR (section 6.1.2.4) is a MAY/SHOULD/MUST
in this BCP. It says: "If route origin validation is implemented". This
document is a BCP, so I'm expecting instructions... which I don't find in some
sections.
-
Let's take as an example an IXP in the RIPE region for IPv4. It
would be allocated a /22 by RIPE NCC (X.Y.0.0/22 in our example) and
use a /23 of this /22 for the IXP LAN (let say X.Y.0.0/23).
See http://tools.ietf.org/html/rfc5737.
-
There are also some text improvement exchanged between Lionel Morand (OPS DIR)
and the authors. Not copied here, as I understand from the email exchange that
the changes will be integrated in the next draft version.

The draft looks good and matched my previous operational security practices for
BGP. I just found some nits and the RFC editor pass should catch more, so I'll
just list a few in an effort to be helpful.
Nit: Second sentence of the introduction is awkward.
Section 6, AS abbreviation is used, but prior expansion of the acronym, didn't
include the acronym. I think it was in the introduction.
Traffic is spelled trafic in one place.
Security considerations - the last sentence should match the tense of the
previous sentence.
Suggest changing from: "It will not detail...."
To: "It does not detail…."
Thanks for your work on this draft!

-- Section 2 --
If this is perfectly acceptable, one
should note that every configured exception has an impact on the
complete BGP security policy and requires special attention before
implementation.
I don't understand what "If this is perfectly acceptable" is meant to say. Can
you re-phrase this sentence so that what you mean is clearer? Maybe, "While it
is acceptable to accommodate local needs, ..." ?
-- Sections 6.1.1.1 and 6.1.1.2 --
The English in these sections is a bit off, but it's mostly missing articles
and will be fixed by the RFC Editor. But...
Only prefixes with value
"False" in column "Global" MUST be discarded on Internet BGP
peerings.
The "only" here makes this read very oddly, and opens up an uncertainty. I
think you are stating, as a best practice, that all prefixes with "False" in
the "Global" column MUST be discarded. But is anything being stated about
prefixes that do not have "False" in the "Global" column? Which is correct
about those prefixes?:
1. They MUST NOT be discarded.
2. They MAY be discarded.
3. Nothing at all is being said about them.
It's funny how adding a single word, "only", raises this issue, but I think it
does.
-- Section 6.1.2 --
One SHOULD probably NOT consider solutions described
in this section if they are not capable of maintaining updated prefix
filters: the damage would probably be worse than the intended
security policy.
This is very poor use of 2119 language. I don't know whether you mean "SHOULD"
or "SHOULD NOT". I don't know what "SHOULD probably" means, from a 2119
standpoint. I suspect the best way out of this would be to just give a
recommendation and a reason, without using 2119 key words, although Brian's
comment might have the right fix. But whatever you decide, this really needs
to be fixed in some way.
-- Sections 6.1.2.1, 6.1.2.3, 6.1.2.4, 7 --
In several places in these sections, you talk about what "authors recommend".
A small point, but this is a working group document, with IETF consensus. The
recommendation is from the IETF, not from the authors. It would be nice if
this was changed.

I have no problems with the publication of this document, but do have some
comments for consideration...
1. I am surprised that the first 2 paragraphs of section 4 do not use
capitalized 2119 keywords like the rest of the recommendations in this document.
2. In section 6.1.1.2, do you want to include filtering out ::1/128?
3. The 2119 keyword construct in section 6.1.2 ("One SHOULD probably NOT...").
I think "One SHOULD NOT consider solutions..." is what is meant.
4. I think it would be useful to point out that the mechanisms described in
6.1.2.3 and 6.1.2.4 will have data duplicated between them. That duplicate
data needs to be kept consistent since some operators will only do IRR and
others will only do SIDR.
5. The guidance in section 7 is ill-worded. I would suggest the following
change:
OLD:
Authors of this document propose to
follow IETF and RIPE recommendations and only use BGP route flap
dampening with adjusted configured thresholds.
NEW:
This document RECOMMENDS following IETF and RIPE recommendations
and only use BGP route flap dampening with the adjusted configured
thresholds.

I wondered if it'd be worthwhile asking that the
designated expert try ensure that the security and privacy
consequences of new entries also be documented? That's
assuming there are cases where the header field is likely
to transit between ADMDs. I'm not sure if that's really
needed though, but 7001 does have a fairly significant set
of security considerations, so presumably new entries
might also deserve a similar level of documentation. OTOH,
I could buy that experience with 7001 means that this
isn't really needed or that demanding that level of
documentation might be counterproductive.

This has probably already been considered and addressed by the working group,
but coming into this as a neophyte it seems like a glaring omission that the
security considerations of bearer assertions are not discussed here. Isn't it
the case that the use of bearer assertions requires a trust relationship
between the client and relying party such that the client can be assured that
the relying party will not misuse the assertion to authenticate with some other
entity? I realize that this sort of assertion will likely only be used in
cases where the assertion only works to authenticate to a specific relying
party, but I think this bears mentioning in the security considerations.

3 -
Assertions used in the protocol exchanges defined by this
specification MUST always be protected against tampering using a
digital signature or a keyed message digest applied by the issuer.
Why is that? Aren't you using assertions over a protected channel (as required
by the spec) and therefore not need to sign the assertions? Indeed, why would a
self-issued Bearer Assertion need to be signed at all? Does that even make
sense?
4.1 -
grant_type
REQUIRED. The format of the assertion as defined by the
authorization server. The value MUST be an absolute URI.
That MUST is unnecessary. It's just definitional from 6749, 4.5 (which, as it
happens, doesn't use 2119 language for this). s/MUST/will
assertion
REQUIRED. The assertion being used as an authorization grant.
Specific serialization of the assertion is defined by profile
documents. The serialization MUST be encoded for transport within
HTTP forms. It is RECOMMENDED that base64url be used.
The MUST seems weird here. Are you saying that the profile could not possibly
have a serialization for an assertion that did not require further encoding?
But the RECOMMENDED seems downright wrong: Either an implementer *needs* to
know the encoding independently of the profile, and therefore this needs to be
a MUST, or the profile is going to describe the encoding, which could be
base64url or could be something else, and the implementation will do whatever
the profile says. If you really want to say something here, I suggest replacing
the last two sentences with:
If the assertion is going to be transported within HTTP forms, the
profile document needs to describe what (if any) encoding will be
needed to do so. The base64url encoding is widely implemented and
therefore suggested.
scope
[...]
As such, the
requested scope MUST be equal or lesser than the scope originally
granted to the authorized accessor.
s/MUST/will (unless you explain whether it's the server or the client that's
supposed to be obeying that MUST, and for what reason)
If the scope parameter and/or
value are omitted, the scope MUST be treated as equal to the scope
originally granted to the authorized accessor. The Authorization
Server MUST limit the scope of the issued access token to be equal
or lesser than the scope originally granted to the authorized
accessor.
In the first sentence, is this MUST for the server? (That is, shouldn't it be,
"If the scope parameter and/or value are omitted, the server MUST use the value
of the scope originally granted to the authorized accessor."?) And anyway,
don't these two sentences contradict 6749, which says:
The authorization server MAY fully or partially ignore the scope
requested by the client, based on the authorization server policy or
the resource owner's instructions.
[...]
If the client omits the scope parameter when requesting
authorization, the authorization server MUST either process the
request using a pre-defined default value or fail the request
indicating an invalid scope.
Here and throughout: s/non-normative example/example (As far as I know, there
are no other kinds in IETF documents.)
4.1.1 - s/MUST construct/constructs
4.2, client_assertion_type and client_assertion: See comments from 4.1
regarding grant_type and assertion.
4.2.1 - s/MUST construct/constructs
5.2 -
s/MUST identify/identifies
For clarification:
OLD
Assertions that do
not identify the Authorization Server as an intended audience MUST
be rejected.
NEW
The Authorization Server MUST reject any assertion that does not
contain the its own identity as the intended audience.
END

Putting one discuss here rather than one on each of the other
docs. We can fix that as appropriate after we chat. Where are
the MTI signature and mac algs for these specified? If those
can be tracked back via the SAML and jose docs that's fine,
but I'm not sure if they are.

- general: What prevents/detects conflicts between the oauth
scope parameter and the saml or jwt equivalent? Are there
other bits of replicated data that could be the basis for a
vulnerability?
(The comment below applies for both saml and jwt so
putting it here.)
- The no replay protection issue was debated in the
WG wasn't it? (I think I recall it, not sure.) Seems like a
bad plan to me to not require at least implementation of
replay protection in the AS so that it can be turned on. Can
you point me at where that was discussed on the list?

"The assertion MUST contain an Audience that identifies the Authorization
Server as the intended audience. Assertions that do not identify the
Authorization Server as an intended audience MUST be rejected."
Could you please identify the threat model within which this "MUST" is
required? This requirement doesn't follow from any of the threats elaborated
in Section 8.
The Audience is only necessary if the Issuer wishes to constrain the set of
Authorization Servers with which an assertion may be used. So ISTM that this
should be "MAY contain..."

"keyed message digest" -> "Message Authentication Code"
That's the proper terminology [RFC4949], especially since there are MACs that
are not based on digests.
"This mechanism provides additional security properties." -- Please delete this
or elaborate on what security properties it provides.
Section 8.2 should note that "Holder-of-Key Assertions" are also a mitigation
for this risk.

Pete did a nice job on the 2119 key words, so I have nothing to add there.
-- Section 6.1 --
The example in Section 4.2 that shows a client authenticating using
an assertion during an Access Token Request.
Is "that" an extra word that should be removed? (Also in Section 6.3.)

Completely non-blocking comment...
I agree with Adrian that there could be some confusion introduced with
classification of flows as described in this document. It is exacerbated by
the categorization illustrated in Figure 1. I will note that there is research
that shows that a multitude of short-lived flows (regardless of their size) may
cause more problems than the long-lived large flows focused on in this document.

Thank you for writing a very readable document on a difficult subject. Glad to
see the document move forward!
However, the status code fields in Sections 5.3 and 5.4 would in my opinion
require some text in the IANA considerations section, to create a new namespace
for the values, and indicate the rules for making new allocations in those
spaces.

An easy read; thanks.
I have only a couple of very minor comments, both about some 2119 key words:
-- Section 4.1.2 --
A PAR MUST advertise its support for multicast by setting the M-bit
in the Proxy Router Advertisement (PrRtAdv) message, as specified in
Section 5.1 of this document.
The "MUST" seems out of place here. It seems that a PAR that doesn't support
multicast doesn't do this, so that PAR violates this "MUST". A PAR that does
support multicast does this, as part of its having multicast support. So this
should probably say, "A PAR that supports multicast advertises that support by
setting ..."
Similar comment about the NAR in Section 4.1.3.
-- Section 4.2.2 --
After the departure of the MN and on the reception of a LEAVE
message, it is RECOMMENDED that the PMAG terminates forwarding of the
specified groups and updates its multicast forwarding database.
What are the consequences of not terminating the forwarding? Why might a PMAG
continue the forwarding, despite the "SHOULD" here, and what will happen if it
does?

I welcome this document and think it is a useful addition to the canon.
However, John Scudder did a Routing Directorate review during the IETF
last call period and emailed his comments to the authors and to the
GROW mailing list. I have seen no response to this directly or on the
GROW list.
Therefore, from a process point of view, I adopt all of John's comments
as a Discuss even though many of the points are small and would normally
be just Comments.
- Throughout the document, various terms are used to describe what RFC
4271 calls a "route". The definition given in RFC 4271 is:
Route
A unit of information that pairs a set of destinations with the
attributes of a path to those destinations. The set of
destinations are systems whose IP addresses are contained in one
IP address prefix carried in the Network Layer Reachability
Information (NLRI) field of an UPDATE message. The path is the
information reported in the path attributes field of the same
UPDATE message.
That is, one NLRI plus its path attributes, as carried in an UPDATE,
is a "route". I would suggest adopting this term, or "BGP route" if
you prefer, instead of terms such as "NLRI UPDATE message", "NLRI
message", "prefix UPDATE message", and even just plain "NLRI" and
"message". Also some, but not all, of the uses of "prefix". I think
doing so will make the document clearer, more readable, and more
technically accurate. A simple search for the terms I've called out
should show most of them so I won't enumerate them here unless you
ask me to (feel free, if you want).
- Reference [RS-ARCH] is a dead link. I found a live copy at
http://www.cs.usc.edu/assets/003/83191.pdf. It might be worth
checking with the authors of RS-ARCH to ask what a good archival
reference is.
- S. 4.2 talks about scaling. I'm trying to make sense of the analysis:
Regardless of any Loc-RIB optimization technique is implemented, the
route server's control plane bandwidth requirements will scale
according to O(P * N), where P is the total number of unique paths
received by the route server and N is the total number of route
server clients.
So far so good. (Except nit: there seems to be a word missing, such
as "whether" as in "Regardless of whether any Loc-RIB...")
In the case where P_avg (the arithmetic mean number
of unique paths received per route server client) remains roughly
constant even as the number of connected clients increases, this
relationship can be rewritten as O((P_avg * N) * N) or O(N^2).
I don't see where the second factor of N comes from. You're basically
expanding the P in the first expression as P_avg * N -- but why? I
think this would only apply if add-path all-paths was chosen as the
path hiding mitigation strategy -- but this is not touched on in
route-server-operations, only in ix-bgp-route-server, and besides that
the beginning of the paragraph implies you're analyzing the multiple
Loc-RIB strategy, so I don't guess all-path is what you were thinking
of. If you're not doing all-path, the O(N^2) analysis is wrong AFAICT.
To see this, consider that the inbound routes require O(P_avg * N)
which is just O(N), but the number of routes you're going to advertise
is bounded by the size of the Internet routing table, which is a
constant for purposes of this analysis, so also O(N). In and out are
summed, not multiplied, so the whole thing works out to be O(N), not
O(N^2).
So I think this needs to either be corrected, or the assumptions need
to be better explained. Moving on:
This
quadratic upper bound on the network traffic requirements indicates
that the route server model will not scale to arbitrarily large
sizes.
If you continue to think this sentence is warranted, I think it should
be better quantified. Of course nothing can scale to *arbitrarily*
large sizes, but that still leaves a lot to the imagination. I would
think it would be beneficial for an IX operator reading this document
to be able to have some idea of how practical the limitation is. Since
the analysis in question is looking at control traffic bandwidth
consumption, it wouldn't be too onerous to throw some simple
assumptions up against it -- for example, "if we suppose a RS receives
on average 100,000 routes from each client with a rate of change of 10
routes/second, sends on average 1,000,000 routes to each client with a
rate of change of 100 routes/second, and that each route consumes on
average 50 bytes in a BGP UPDATE message, simple arithmetic shows that
a GigE connection to that RS will be fully saturated by the time the
number of clients reaches 25,000." (Which does not seem like a very
practical limitation, the RS will hit a CPU or memory bottleneck
first.)
Anyway, maybe you will decide on reconsideration of the big-O analysis
that this bit is not needed at all, which would be OK with me.
- S 4.2.2.1,
If the route server
operator has prior knowledge of interconnection relationships between
route server clients, then the operator may configure separate Loc-
RIBs only for route server clients with unique outbound routing
policies.
It wasn't obvious to me what "outbound" applies to -- the client? The
RS? -- and for that matter why an inbound policy (on the RS) might not
apply. Possibly this could be remedied by simply dropping the
adjective "outbound".
- S. 4.2.1.2,
destination splitting would require significant co-ordination
between the route server operator and each route server client
It's not clear to me why it would "require significant co-ordination",
depending on what resource you're trying to conserve. Two examples of
how you could avoid coordination while still getting benefit: You
could have clients send all their routes to all the RSes, but have
RSes filter out the prefixes they don't care about. This gives the RS
most of the CPU benefit it would have gotten had the client done the
filtering (prefix filtering is cheap), almost all the memory benefit
(the filtered routes need not be retained in the Adj-RIB-In), and
around half the control traffic bandwidth benefit. The client incurs
cost to send duplicate routes that are going to be discarded by the
RS, but the client is presumably not the bottleneck resource. Or
better still, the RS could use ORF towards the clients to control
what routes the clients will send.
- S. 4.6.1,
OLD:
Prefixes sent to the route server are tagged with specific [RFC1997]
or [RFC4360] BGP community attributes
I don't think the naked references scan well as adjectives in this
context. I suggest
NEW:
Prefixes sent to the route server are tagged with specific standard
[RFC1997] or extended [RFC4360] BGP community attributes
- Also in S. 4.6.1,
OLD:
As both standard and extended BGP communities values are restricted
to 6 octets
Actually standard communities are restricted to less than that.
Perhaps reword as
NEW:
As both standard and extended BGP communities values are restricted
to 6 octets or fewer
- Also in S. 4.6.1,
route server operator should take care to ensure
that the predefined BGP community values mechanism used on their
route server is compatible with [RFC4893] 4-octet autonomous system
numbers.
I suspect an RS operator reading this might be left scratching his or
her head and asking "what does it mean for me to be compatible with
RFC4893 in this context"? It would be kind to offer them some
guidance, since after all this is a guidance document.
- S. 4.7: Where you say "non-commutative" I think you mean "non-
transitive".
- S. 4.7:
Problems of this form can be dealt with using [RFC5881] bidirectional
forwarding detection.
It's not clear to me how certain non-transitive forwarding failures
can be dealt with using BFD. To take an example, suppose clients A, B
and C peer with RS. The IX fabric has a failure such that A and B can
both reach RS, but not each other. C has connectivity to everyone.
Prefix X is advertised to RS by both B and C. For whatever reason, RS
selects X via B to advertise to A. Even if A runs BFD towards B, at
best A can determine that the route from RS can't be used. A isn't
able to fail over to C's route as it would in the full-mesh case,
since it's not aware of it. Depending on A's other connectivity, this
may result in sub-optimal routing towards X, or complete loss of
connectivity to X.
It's beyond the scope of the draft to solve this problem, but the text
could be made more accurate. A minimal fix would be
Problems of this form can be partially mitigated using [RFC5881]
bidirectional forwarding detection.
although you might want to go on a bit longer to explain what problems
can't be mitigated.
- S. 4.8:
This problem is not specific to route servers and it can also be
implemented using bilateral peering sessions. However, the potential
damage is amplified by route servers because a single BGP session can
be used to affect many networks simultaneously.
This is true, but there is a more severe way RSes aggravate the
problem: In a full mesh, a router can (and usually does) directly
enforce a "no third-party next hops" policy against its peers. An RS
peer by definition cannot enforce this policy against the RS, so the
RS is the only place it can be enforced.
- S. 4.8:
Route server operators SHOULD check that the BGP NEXT_HOP attribute
for NLRIs received from a route server client matches the interface
address of the client. If the route server receives an NLRI where
these addresses are different
so far so good (modulo my first comment about the use of "NLRI", of
course), but:
and where the announcing route server
client is in a different autonomous system to the route server client
which uses the next hop address,
Is the RS sincerely expected to enforce the above? I suppose it could
be implemented automatically although imperfectly, by noticing that
multiple clients are in the same neighbor AS and noticing when they
use each other as third-party next hops, but AFAIK people generally
don't try to figure this out, they just do what you've said in the
preceding sentence -- make sure the NH matches the interface address.
If you really do propose that the RS should allow third-party next
hops but only from clients in a common AS, I think you should talk
about it specifically and in more detail. If you didn't really mean
that, then I suggest you drop the clause.
- S. 5:
On route server installations which do not employ path hiding
mitigation techniques, the path hiding problem outlined in section
Section 4.1 can be used in certain circumstances to proactively block
third party prefix announcements from other route server clients.
I don't understand what this means. Specifically, I don't know what it
means to "proactively block third party prefix announcements" or for
that matter, even what you mean by "third party prefix announcements"
in this context. (As a term of art, I normally understand "third party
announcement" in a BGP context to mean announcing a third-party next
hop as you discuss in S. 4.8). I also don't know what the "certain
circumstances" are, quite likely these should be given at least a
little color if not entirely spelled out.
Also, a nit -- the xref expansion has put "section section" into your
text.
- S. 7:
BIRD, OpenBGPD and Quagga, whose open source BGP implementations
include route server capabilities
Great, cool, but:
which are compliant with this
document.
I'm not sure what it actually means to be "compliant" with a document
that "describes operational considerations". Perhaps just drop the
phrase?

Nits also taken from John's review:
- In S. 2,
OLD:
BGP sessions between each participant router
NEW:
BGP sessions between each pair of participant routers
- In S. 4.2.1.1,
OLD:
In
this situation, the multiple Loc-RIB views required by each client
are merged into a single view.
As written, this implies that each client requires multiple Loc-RIB
views, which I don't think is what was intended. I suggest:
NEW:
In
this situation, multiple Loc-RIB views
are merged into a single view.
- I personally am strongly put off by the neologism "granular" to mean
"fine-grained" and suggest the latter instead. I realize it's not an
unusual usage so by all means disregard if you feel strongly about
it.
- S. 4.6.2:
OLD:
server operators to implement construct per-client routing policies.
NEW:
server operators to construct per-client routing policies.

With all of the RECOMMENDations and things you SHOULD do in this document in
order to prevent "bad things" from happening, I'd have thought this to be
BCPish. But I am made to understand that RSs are (somewhat) controversial
beasts, and therefore making it a BCP would have been controversial as well.
You might want to mention that fact in the document, but it's up to you.

Thank you for responding to the SecDir review. I see some wording suggestions
were made in response to this review, I do think they would be helpful and
would like to see the updates made to the draft:
https://www.ietf.org/mail-archive/web/secdir/current/msg05065.html
From my review:
This is a non-blocking comment for consideration only. In figure 2, I see the
point of the connections from each router into the route server, demonstrating
the point made in the section where you just need n connections to the route
server instead of n*(n-1)/2. What do the dots on the outside connecting each
router represent? I'm asking because the draft later describes use of a shared
media like Ethernet, but this diagram looks like the routers are directly
connected and it appears to require passing through other routers to route
traffic. If that's correct, you could probably just remove the dots on the
outside edge connecting the routers since the point of this section is the
connections to the route server. Otherwise, if they have some sort of meaning,
it might help to explain what the dots represent, I'm guessing it's not for the
actual exchanges of data.
Section 4.2, The first sentence of paragraph 2 is missing a word making it a
bit difficult to read.

Given the content of the document as indicated by the Abstract, it
would be really helpful if the document title also included "failure."
Something like "Analysis of Failure Cases in IPv6 Roaming Scenarios"

It seems a bit odd to publish this in the IETF instead of 3GPP or the like, but
presuming we're not stepping on any toes and the WG thought this was good
information to put out there, I don't see any reason to object.

I have a number of nits. Please treat them as such.
- Is "current" correct in the name for an RFC? Perhaps
"known" is better.
- intro, para 2: s/is tasked/was tasked/
- s2, 2nd para: "a more systemic solution" is left hanging
- do you mean TLS1.3? If so, maybe say so?
- 2.5, 2nd para: draft-ietf-tls-prohibiting-rc4 does not
itself provide those details, maybe say "see the
references in" that draft?
- 2.6: should the RFC editor wait on the official
allocation of the BEAST CVE number? I don't think that's
happened already has it?
- 2.7, is Bleichenbacher really a certificate attack? I
think its not, but is a pkcs#1 encryption attack. (It
would apply just as well to OOB keys in TLS.) I'm not sure
if Klima is or is not the same in that respect. Also the
timing attacks in the 2nd para, don't seem to be
certificate related are they? So perhaps only the last
para is really certificate related?
- 2.8: I forget if this has been discussed - should three
be a reference to draft-ietf-tls-negotiated-ff-dhe
- 2.10: isn't TRIPLE-HS published yet?
- 2.12: A reference would be good here if we have one,
esp. for the "It is known" point.
- 2.13, para1: "fully specified" isn't really correct
there, I think you mean 'properly specified, so that
implementations should be "secure"' but maybe some other
wording would be better.
- 2.13: Doesn't that paper also blame hard-to-use APIs as
well as the IETF protocols and their complexity? Worth a
mention?

It might be useful to note that SSL stripping is a flavor of downgrade attack.
Likewise, it could be worth noting in this section or Section 2.2 that STARTTLS
is very vulnerable to downgrade without some sort of HSTS-like mechanism. For
example, there's some recent evidence of downgrade attacks on mail protocols.
Downgrade in general could use more attention. The IETF can fix things in
newer versions of the protocol, but if the client and server can't negotiate
that version, it's all for naught.
https://www.techdirt.com/blog/netneutrality/articles/20141012/06344928801/revealed-isps-already-violating-net-neutrality-to-block-encryption-make-everyone-less-safe-online.shtml
Given the news about POODLE this week, I would suggest changing Section 2.4 to
be "Padding Oracle Attacks", and adding POODLE there.
I'm surprised not to see some mention of Heartbleed in Section 2.13.

Thank you for your work on this draft! It looks great, but I do have two items
I'd like to discuss and see if we can add text to address these
concerns/attacks before switching to a yes.
1. I think it would be very helpful to include what techniques are used by
forensic tools that enable access to decrypt TLS sessions and how to
respond/prevent that access. I see you have certificate attacks listed in 2.7,
which is what the common forensic tools leverage. However, these tools also
require access to the private key, and it would be helpful to mention the
importance of protecting the private key, preventing exportability, etc..
In 2.7 I thin it would be helpful to explicitly state that commonly used
forensic tools such as wireshark require access to the private key as well as
use of RSA. Another option might be to add the recommendation to protect the
private key in 2.13. If you can't export the key (from an implementation
perspective), that could go a long way to helping to reduce this method of
exposure.
I've included a few links for additional information on the list of tools and
explicit details of the attack used in case this is helpful.
Forensic tools that rely on a MiTM attack to decrypt TLS and DTLS session:
http://forensicswiki.org/wiki/SSL_forensics
Wireshark requires you have the key:
http://support.citrix.com/article/CTX116557 Mitigations are to protect the
private key and to not use RSA:
http://wirewatcher.wordpress.com/2010/07/20/decrypting-ssl-traffic-with-wireshark-and-ways-to-prevent-it/ http://wiki.wireshark.org/DTLS?highlight=%28tls%29
If you would like text suggestions, let me know and I'll help.
2. I realize this draft covers explicit attacks against TLS, however since
pervasive monitoring is considered an attack, it could be helpful for this
draft also cover techniques used by middle boxes to intercept TLS streams
(proxy firewalls, load balancers, etc.). Although these are more of 'attacks'
on the user, than of TLS, it could be a short addition to have this documented.
The TLS session is intercepted, or in the case of a load balancer it might be
terminated, with a second TLS session initiated to the destination allowing
traffic to pass in the clear on the middlebox. The user is alerted and warned
to accept an untrusted certificate in this process, and many do as a result of
corporate restrictions (they have no choice if they want to go to that site).
Perhaps implementation recommendations could assist here to improve the
warnings to the user, letting them know their traffic may be passing in the
clear and they may not want to continue with their session. Some may chose to
avoid certain transactions from the work place as a result.
If you have already discussed these items and decided they were out of scope,
please let me know. I support this work and just wanted to make sure we
covered all bases or put some out of scope. Thank you.

A tiny thing in the Introduction: "suffice it to say" may sound cute, but if it
were really sufficient, the document could stop there. I suggest replacing
"suffice it to say" with "a quick summary is". But take this or leave it as
you please.
My other comment is more significant:
-- Section 2.13 --
o Implementations may not validate the server identity. This
validation typically amounts to matching the protocol-level server
name with the certificate's Subject Alternative Name field. Note:
historically, although incorrect, this information is also often
found in the Common Name part of the Distinguished Name instead.
I had to read the "note" a few times before I followed it. It's not the
information that's incorrect, and the "also ... instead" bit is confusing. But
the biggest problem is that it's unclear what the incorrect thing IS: is it
that the information is put in the CN and shouldn't be? Or is it that
validators retrieve it from there instead of from the SAN? Maybe this (correct
it as necessary)?:
NEW
o Implementations might not validate the server identity. This
validation typically amounts to matching the protocol-level server
name with the certificate's Subject Alternative Name field. Note:
this same information is often in the Common Name part of the
Distinguished Name also, and some validators incorrectly retrieve
it from there instead of from the Subject Alternative Name.
(That also changes the "may" to "might", to avoid accidentally conveying a
sense of permission.)

[Comment updated to include review comments from Nobo Akiya]
[Comment updated further to include review by Julien Meuric]
In addition to my 5742 conflict review, I have also reviewed this
document. My comments are for the IRTF and authors in the hope they
may be useful for improving the document. They do not need discussion
with me unless that will be of value to the authors, and the comments
can freely be ignored or discarded.
I have sent the document to the chairs of the I2RS, PCE, and BFD
working groups to ask them to check the details in sections 4.4, 4.6,
and 4.7 respectively. I don't expect their responses to be blocking on
publication, but I do feel that their review is important.
---
Fundamentally missing in all of the descriptions of "planes" is the
concept of free, inter-entity communication within a plane. The focus
in the document is, of course, on the north/south communication between
entities in different planes, but this tends to give the impression of
isolation of entities within any one plane.
Maybe the concept of "planes" is so well known that this should be
obvious to the reader, but I feel that explaining the concept of planes
would be helpful in preventing stove-pipe thinking. This understanding
(which would be false) is only exacerbated by Figure 1 which (owing in
some part to the essentials of ASCII Art and in some part to your
intention of highlighting the north/south interactions) gives a strong
impression of a single entity in a plane doing stuff in isolation
within its own plane.
I don't propose a change to the figure, and I think a few words on the
nature of a "plane" would address the whole issue.
FWIW, the "issues" persists in the main body of text with some smudging
of terminology. Thus, in section 3.2, ...
Each network device has
both a Forwarding Plane and an Operational Plane.
...suggests that there are multiple instances of a plane each
instantiated in a different device. Where, I think, it's really the
other way around: each device has a presence in the various planes.
Another example is in section 3.3...
The control plane is usually distributed and is responsible mainly
for the configuration of the forwarding plane using a Control Plane
Southbound Interface (CPSI) with DAL as a point of reference. CP is
responsible for instructing FP about how to handle network packets.
Communication between control planes, colloquially referred to as the
"east-west" interface, is usually implemented through gateway
protocols such as BGP [RFC4271] or other protocols such as PCEP
[RFC5440].
...The communication between control planes is an interesting concept
compared to the communicaiton within a control plane (which is usually
distributed as you say). I guess, from your example of BGP, that you
are talking about communications in the control plane between islands of
control plane elements that communicate amongst themselves.
------
On page 10 you have
Further, traditionally, the control plane has been
tightly coupled with the network device.
Yet this is in contradiction to the text on page 3
Further, the concept
of separating the control and data planes, which is prominent in SDN,
has been extensively discussed even prior to 1998
And, indeed, the "traditional" tight coupling will not be recognized by
people coming from the older (and therefore more traditional?) world of
transport networking.
I think you probably just want to relax your "tightly coupled" to be
talking about the distribution of control plane function and its
implementation on network devices "in many networks especially in
Internet routers and Ethernet switches" (or some such wording).
---
Section 3.2 has
Examples of Forwarding Plane abstraction
models are ForCES [RFC5812] and OpenFlow [OpenFlow]. Examples of the
Operational Plane abstraction model include the ForCES model
[RFC5812], the YANG model [RFC6020], and SNMP MIBs [RFC3418].
This gives the impression that YANG models and MIB modules are not
examples of Forwarding Plane abstraction, but I think they are (or can
be).
---
In Section 3.3 I think you are trying to make a sideways statement about
the novelty of SDN, but you are overstating the facts.
Communication between control planes, colloquially referred to as the
"east-west" interface, is usually implemented through gateway
protocols such as BGP [RFC4271] or other protocols such as PCEP
[RFC5440]. However, the corresponding protocol messages are in fact
exchanged in-band and subsequently redirected by the forwarding plane
to the control plane for further processing.
Of course, it depends what you mean! Consider a device receiving
OpenFlow messages. Those messages somehow arrive through the forwarding
plane and are extracted for local processing (i.e., they are not
forwarded).
But my main beef is with you sauing that control plane messages are
exchanged in-band. This will be a surprise to implementers of optical
equipment where this is completely impossible.
---
I think that, notwithstanding your explanations, your readers will have
some problems understanding the distinction between control and
management planes as described in this document, especially sections 3.3
and 3.4. This is notwithstanding section 3.5.
People have been accustomed to considering "management plane" to be the
communication of functions in a north-south mode between a centralised
management station (under human or software control) and network
devices. Thus, an NMS programming forwarding instructions into a network
device would be considered (historically) to be interacting in the
management plane.
You are somewhat redefining this (which is fine - you can define what
you need for your own framework) so that you call these programming
instructions "CPSI", and you call the NMS that is making the programming
decisions "control plane."
I am not suggesting that you change your terminology (unless you
suddenly decide you want to!), but I am pointing out that your readers
may be stumbling (as I did) over this difference. Thus, you may want to
make your definitions of the functionality of the different planes far
clearer and possibly include a statement of the differences compared to
previous ways of discussing the topic.
---
CORBA in section 3.6 might benefit from a reference.
---
Given the content, and contrasting with section 4.1, section 4.2 should
probably be labelled
4.2. NETCONF / YANG
---
In Section 4.4 you say...
Essentially, I2RS aims to make the routing information
base (RIB) programmable thus enabling new kinds of network
provisioning and operation.
While this is *an* aim and indeed a critical deliverable, your phrasing
seems to imply that this is *the* aim of I2RS.
Maybe your point is that, in the context of the control of forwarding
systems, this is the function most relevant to SDN that I2RS is working
on.
---
The discussion of PCEP in section 4.6 is OK as far as it goes. But more
attention should be paid to the work in the PCE working group related to
stateful and active PCEs.
https://datatracker.ietf.org/doc/draft-ietf-pce-stateful-pce/
https://datatracker.ietf.org/doc/draft-ietf-pce-stateful-pce-app/
There is good background discussion of this in
https://datatracker.ietf.org/doc/draft-ietf-pce-questions/
The active PCE can be used to prod the network to establish LSPs and so
has a different place in the architecture you are describing.
Furthermore, there has been established discussion of the use of PCEP in
an even more proactive interaction with the network as discussed in
https://datatracker.ietf.org/doc/draft-farrkingel-pce-abno-architecture/
(which is on its way toward pulication as an AD sponsored IETF consensus
informational RFC).
===================
Nobo Akiya (IETF BFD WG co-chair) provided the following additional comments.
Please consider them.
Hi Authors,
I have few comments in the Section 4.7 of draft-irtf-sdnrg-layer-terminology-02.
> 4.7. BFD
>
> Bidirectional Forwarding Detection (BFD) [RFC5880], is an IETF-
> standardized network protocol designed for detecting communication
> failures between two adjacent forwarding elements.
This description is fairly good, but couple of comments.
1) Usage of "path failures" instead of "communication failures" will be more
appropriate. 2) It might also be helpful to then briefly describe what "path"
can include. Take a look at following documents to get ideas:
- RFC5881 - single-hop BFD
- RFC5883 - multi-hop BFD
- RFC5884 - BFD MPLS
- RFC7130 - BFD on LAG
If both comments are considered, the result might look more like BFD charter
texts. If so, that's probably the right results.
http://datatracker.ietf.org/wg/bfd/charter/
> It is intended to
> be implemented in some component of the forwarding engine of a
> system, in cases where the forwarding and control engines are
> separated.
"It is intended" might be too strong.
- Yes it is true that BFD was carefully designed such that it can be
implemented in hardware [easier], and there are some such implementations. - It
is also true that some implementations place the SW BFD module "close to" the
forwarding engine within the system. - However, it is also true that there are
some implementations that places the SW BFD module fairly "far away" from the
forwarding engine within the system.
It is, therefore IMO, too strong to say "It is intended to be implemented ..."
but probably reasonable to say "It is often implemented ...".
> BFD provides a low-overhead solution for (end-to-end)
> detection of failures, even over technologies that have no or limited
> support to do so, such as virtual circuits, various L3/L4 tunnels and
> MPLS LSPs.
Couple of things threw me off from above.
- (end-to-end)
- technologies that have no or limited support to do so.
Similar to the first comment at the top of this email, my suggestion is to
model this text closer to what is in the BFD charter.
> With respect to Figure 1, a BFD agent could be a control plane
> service or application that would use the CPSI towards the forwarding
> plane to send/receive BFD packets. Better, as it was intended for, a
> BFD agent can run on the device as an application and use the
> forwarding plane to send/receive BFD packets and update the
> operational plane resources accordingly.
This is the paragraph that we want to get right. Using the terminologies from
your document:
[snip]
Forwarding Plane (FP) - The collection of resources across all
network devices responsible for forwarding traffic.
Operational Plane (OP) - The collection of resources responsible
for managing the overall operation of individual network devices.
Control plane (CP) - The collection of functions responsible for
controlling one or more network devices. CP instructs network
devices with respect to how to process and forward packets. The
control plane interacts primarily with the forwarding plane and to
a lesser extent with the operational plane.
Management plane (MP) - The collection of functions responsible
for monitoring, configuring and maintaining one or more network
devices or parts of network devices. The management plane is
mostly related with the operational plane and less with the
forwarding plane.
[snip]
It is clear that the BFD module will need to reside in each network device,
somewhere under the DAL. The first thing that is important to figure is which
element under the DAL does the BFD belong in: Forwarding Plane, App,
Operational Plane. The Forwarding Plane, as described above, is responsible for
forwarding traffic. That's definitely not the job of BFD. Thus I would argue
that the BFD belongs in the App or Operational Plane. Either way, because the
BFD is not in the Forwarding Plane, it sounds like it is through the Management
Plane that BFD operations come into the DAL.
What gets more interesting is that often BFD alone is not sufficient. What I
mean is that BFD is used by "something" so that "something" can quickly be
notified of the path failure and react accordingly. In the SDN Layer
Architecture: 1. "something" may be something residing in the Application
Plane, so that it can influence the Forwarding Plane through the Control Plane.
2. "something" may be something residing in the Control Plane, so that it can
influence the Forwarding Plane.
If you envision (1) to be the only case, then the Figure 1 looks correct.
If you envision (2) to also be a valid case, then the Figure 1 may need an
interface (line) between the Control Plane and the Management Plane.
Something to think about.
Thanks!
-Nobo
=============
Julien Meuric co-chair of the IETF's PCE working group provided the following
comments. Please consider them.
I'm rather puzzled by the 2nd paragraph in section 3.3.:
"or other protocols such as PCEP [RFC5440]. However, the corresponding
protocol messages are in fact exchanged in-band..."
2 comments:
- even if not only related to PCEP, I don't understand the use of
"however" here (feels like "in-band messaging is necessarily bad");
- AFAIK, it isn't mandatory to convey these messages in-band, and I feel
uncomfortable with a text that turns a typical use-case into a drawback
of a protocol itself.

I feel that this document will be a foundational building block for the entire
IETF and not only the SDN research community, as explained in the abstract:
This document, a product of the IRTF Software-Defined Networking Research
Group (SDNRG), addresses these questions and provides a concise reference
for the SDN research community based on relevant peer- reviewed literature,
the RFC series, and relevant documents by other standards organizations.
Therefore, (and maybe due to the OPS & SDN relationship), I wanted to review
it. Yes, I know, IESG is supposed to only evaluate the conflict review (RFC
5732). Disclaimer: I have not been following the sdnrg discussions.
- I'm not sure what the following statement adds:
This document has been extensively reviewed, discussed, and commented by
the vast majority of SDNRG members, a community which certainly exceeds 100
individuals. It is the consensus of SDNRG that this document should be
published in the IRTF Stream RFC Series [RFC5743].
- I'm always surprised by the multiplication of planes these days: data,
control, management and now operational and application First, there is a
definition for forwarding plane, but I see "data plane" being use in "Further,
the concept of separating the control and data planes .." Second, there are
already some definitions in http://tools.ietf.org/html/rfc7276#section-2.2.4.
This is a pity that those doc. are not in line. Third, the term operational
plane is new to me. I've been spending some time trying to understand the link
with the management plane... My personal view is that adding operational and
application plane is confusing, most probably because I don't understand (any
longer maybe, because I thought I did) what a "plane" is: an interface, a set
of protocols and mechanisms, or a termination point (like the operational state
"The operational plane is usually the termination point for management plane
services and applications"). This document would benefit from an explanation
(and this would be a perfect topic for SDNRG IMO). I believe that this feedback
is in line with Adrian's one.
- a key aspect of SDN is the notion of "northbound" and "southbound"
interfaces. If you know about controller/SDN, you know that it means northbound
or southbound of a controller. This message is not clear in the document.
- I'm surprised not to see the notion of controller in figure 1
Based on my previous point, and this text ..
The SDN northbound interface is implemented in the Network Services
Abstraction Layer of Figure 1.
..., does it mean that the 2 middle boxes are "controllers"? I don't think so.
- Minor point.
If you define the management plane for monitoring as well (and not only
configuration), then you could add IPFIX (and potentially syslog) to
If the Management Plane is not embedded in the
network device, the MPSI is certainly a protocol. Examples of MPSIs
are ForCES [RFC5810], NETCONF [RFC6241], OVSDB [RFC7047] and SNMP
[RFC3411].
Why because it seems there is a tendency in SDN to only think of configuration
for the management plane. For example:
In
contrast, the management plane reacts generally at longer time
frames, i.e. minutes, hours or even days, and thus wire-efficiency is
not always a critical concern. A good example of this is the case of
changing the configuration state of the device.
If you think about IPFIX, the export rate might be X/sec where X > 1.
- I'm surprised see BFD in the draft. Why an OAM protocol in SDN? If you want
to mention OAM, then why only one?
To summarize: I had a lot of hope for this document. I was hoping that it could
be referred to by many other documents, WGs and BoFs (SUPA comes to mind first,
then ACTN), and I'm a little bit disappointed. Would it be a WG document, I
would file a DISCUSS. However, as an IESG member, I'm ONLY doing the conflict
review (RFC 5732). This document doesn't conflict, so "no objection", but it
leaves me with more questions than answers I'm afraid.