Commit Message

On Fri, 2014-01-10 at 08:27 -0800, Tom Herbert wrote:
> On Thu, Jan 9, 2014 at 9:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:> > On Thu, 2014-01-09 at 20:54 -0800, Tom Herbert wrote:> >> When initializing a gro_list for a packet, first check the rxhash of> >> the incoming skb against that of the skb's in the list. This should be> >> a very strong inidicator of whether the flow is going to be matched,> >> and potentially allows a lot of other checks to be short circuited.> >>> >> > Hmm... this idea was discussed in the past. I used it when attempting to> > use a hash table instead of a gro_list last year.> >> > Unfortunately this added lot of cycles when rxhash is not provided by> > hardware, and my tests found it was not a win.> >> > Remember : in most cases, gro_list contains one flow, so this test does> > nothing special but adds overhead.> > I don't understand what your basis is that gro_list in most cases> contains one flow, but assuming that were true, maybe we should make> the it only contain one flow eliminating the complexity of multiple> flows (same_flow logic is convoluted and layers of encapsulation is> not going to simplify things).>
The complexity of GRO is the aggregation itself. You wont avoid it.
> If we are doing RPS or RFS, rxhash will be computed anyway, so the> case your optimizing is pretty narrow: no RPS, no RFS, no hardware> hash, and a single flow in gro_list. Nevertheless, if this is really> an important concern, we can make the check directly against> skb->rxhash so to be opportunistic and avoid the possibility of> computation.
We'll compute rxhash once per GRO packet, containing up to 40 MSS
packets.
Thats a big difference.
If your patch was doing this, I would have no complain.
(No new conditional branch, and skb->rxhash, if provided by the NIC,
can give a hint.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Comments

On Fri, Jan 10, 2014 at 9:45 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2014-01-10 at 08:27 -0800, Tom Herbert wrote:>> On Thu, Jan 9, 2014 at 9:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:>> > On Thu, 2014-01-09 at 20:54 -0800, Tom Herbert wrote:>> >> When initializing a gro_list for a packet, first check the rxhash of>> >> the incoming skb against that of the skb's in the list. This should be>> >> a very strong inidicator of whether the flow is going to be matched,>> >> and potentially allows a lot of other checks to be short circuited.>> >>>> >>> > Hmm... this idea was discussed in the past. I used it when attempting to>> > use a hash table instead of a gro_list last year.>> >>> > Unfortunately this added lot of cycles when rxhash is not provided by>> > hardware, and my tests found it was not a win.>> >>> > Remember : in most cases, gro_list contains one flow, so this test does>> > nothing special but adds overhead.>>>> I don't understand what your basis is that gro_list in most cases>> contains one flow, but assuming that were true, maybe we should make>> the it only contain one flow eliminating the complexity of multiple>> flows (same_flow logic is convoluted and layers of encapsulation is>> not going to simplify things).>>>> The complexity of GRO is the aggregation itself. You wont avoid it.>>> If we are doing RPS or RFS, rxhash will be computed anyway, so the>> case your optimizing is pretty narrow: no RPS, no RFS, no hardware>> hash, and a single flow in gro_list. Nevertheless, if this is really>> an important concern, we can make the check directly against>> skb->rxhash so to be opportunistic and avoid the possibility of>> computation.>>> We'll compute rxhash once per GRO packet, containing up to 40 MSS> packets.>> Thats a big difference.>
The objective is to compute the rxhash at most once per packet.
> If your patch was doing this, I would have no complain.>> (No new conditional branch, and skb->rxhash, if provided by the NIC,> can give a hint.)>> diff --git a/net/core/dev.c b/net/core/dev.c> index ce01847793c0..c9f7a26d7ce7 100644> --- a/net/core/dev.c> +++ b/net/core/dev.c> @@ -3798,7 +3798,8 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)> for (p = napi->gro_list; p; p = p->next) {> unsigned long diffs;>> - diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;> + diffs = p->rxhash ^ skb->rxhash;> + diffs |= (unsigned long)p->dev ^ (unsigned long)skb->dev;> diffs |= p->vlan_tci ^ skb->vlan_tci;> if (maclen == ETH_HLEN)> diffs |= compare_ether_header(skb_mac_header(p),>
As I said in the commit log, the skb->rxhash should be a very strong
indicator of that flows will match (maybe >99% ?), so putting that
first potentially short circuits a lot of work even in just this
function.
>>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

On Fri, 2014-01-10 at 10:15 -0800, Tom Herbert wrote:
> The objective is to compute the rxhash at most once per packet.
Once per GRO packet ( up to 45 MSS) or once per incoming frame ( 1 MSS
) ?
Your patch computes rxhash for every incoming frame.
> > > If your patch was doing this, I would have no complain.> >> > (No new conditional branch, and skb->rxhash, if provided by the NIC,> > can give a hint.)> >> > diff --git a/net/core/dev.c b/net/core/dev.c> > index ce01847793c0..c9f7a26d7ce7 100644> > --- a/net/core/dev.c> > +++ b/net/core/dev.c> > @@ -3798,7 +3798,8 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)> > for (p = napi->gro_list; p; p = p->next) {> > unsigned long diffs;> >> > - diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;> > + diffs = p->rxhash ^ skb->rxhash;> > + diffs |= (unsigned long)p->dev ^ (unsigned long)skb->dev;> > diffs |= p->vlan_tci ^ skb->vlan_tci;> > if (maclen == ETH_HLEN)> > diffs |= compare_ether_header(skb_mac_header(p),> >> As I said in the commit log, the skb->rxhash should be a very strong> indicator of that flows will match (maybe >99% ?), so putting that> first potentially short circuits a lot of work even in just this> function.
Are you speaking of your "goto skip;" ?
compare_ether_header() is done with 10 instructions on x86_64
There is no point trying to avoid it.
Really, 99% of the time gro_list contains zero or one single slot, I
have hard data saying so.
If you want to optimize the case where list is fully populated (because
of yet another synthetic benchmark you use), you really need to build a
temporary list so that all layers do not even have to check
NAPI_GRO_CB(p)->same_flow
Each gro handler would remove non matching flow from this temp list.
-> when we finally reach tcp_gro_receive(), list would contain a single
element.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

> Really, 99% of the time gro_list contains zero or one single slot, I> have hard data saying so.>
Please provide the data.
> If you want to optimize the case where list is fully populated (because> of yet another synthetic benchmark you use), you really need to build a> temporary list so that all layers do not even have to check> NAPI_GRO_CB(p)->same_flow>
Well if you prefer, you can think of the "synthetic benchmark" as
emulating an obvious DOS attack by pumping MSS sized TCP segments with
random ports to a server. The stack needs to be resilient to such
things, an O(n*m) algorithm in the data path is a red flag.
> Each gro handler would remove non matching flow from this temp list.>> -> when we finally reach tcp_gro_receive(), list would contain a single> element.>>>> --> To unsubscribe from this list: send the line "unsubscribe netdev" in> the body of a message to majordomo@vger.kernel.org> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html