Vlan performance issues (oh and a xen Q).

I just wanted to point out/ask about vlan performance on latest shibby.

I have noticed that my gigabit lan speeds drop to 125 mbit/s (in my case, on WNR3500Lv2), form the expected usual ~930 mbit/s. I assume(d) this is due to it being handled by the cpu, in the ip stack, as opposed to the layer 3.

Losing 85+% efficiency is a drag, making it rather pointless with vlans for all gigabit scenarios.

I just wanted ot doublecheck or have it confirmed this is the reason and not just my model, or implementation issues?

I guess I can try to disable some ip filtering on the bridge or so, (like in xen), but not sure if this will make the system insecure/pointless. Haven't tested yet so this is just thrown out here.

On a positive side, I started with xen and am amazed (a xen setup will take over the router's job).

However, I still wished to vlan a domain guest private server with the router (which will function as 2nd firewall => LAN + wifi), alternatively adding the wifi to a forward bridge (on the xen machine, in front of the router). I guess none of those will work, apart from the vlan, but which will most likely create the same cpu overload/slowdown as mentioned above, right? :/

Would love to have someone join in a bit on the vlan testing, just in case, it *could* be resolved (disabling bridge netfilter, issues... other ideas).

Finally, I will use virtual bridges in xen to do tagged vlans possibly and thus, would perhaps think of trunking to the tomato... Again, on the WNR3500Lv2, this is *experimental* by victek is it? However, when I try to load a kernel module 8021q, it does not exist on the router... so just how does that work on tomato then?

Cheers,

lolziecat

(PS. I think the same issues happened on locally compiled tomato + victek version so it's not just shibby but our kernel overall etc).

Addendum: I see we use 801.11q (ok, still no module I can recognise); vlan tag and trunk nvram variables exist after looking via advanced-vlan.asp and grepping nvram. Does that mean it has the actual HW support, or rather just was compiled with those options? SOrry, if it is a silly question.

WHen I am done with Xen, I will investigate.. it is just that, Xen is quite a project, and if it turns out it all is for nought, I kinda am again in need of re-designing the entire network topology which isn't entirely trivial.

For the record... will use open-vswitch on Xen, which does have some forward plane control. Not sure if I can somehow hack that together with my router though.

The answer is: it depends. The VLAN capabilities are partially hardware-driven, meaning the Broadcom 5-port switch IC used within the router does have native VLAN tagging/untagging/802.1Q (not 801.11Q There is no such thing) support. There is a binary blob driver called et.ko that handles the communication, if I remember correctly. I have no idea if the WNR3500Lv2 uses a switch that can make use of this capability.

However, remember that the wireless chip is a completely separate thing, and as I understand it any kind of VLAN segregation involving it (or any other interface other than what's on the 5-port switch) will result in software-accomplished VLANs, thus you will take a performance hit.

The existence of NVRAM variables is not an indicator of "hardware-level support" by any means; think of NVRAM like battery-backed RAM on old Nintendo or Sega Genesis cartridges. It's just a bunch of RAM that can be used to store some data (key/value pairs). You could nvram set butts=ilikethem ; nvram commit and add your own variables; does it mean the hardware has support for butts?

Profiling ("benchmarking") performance of network packets within an embedded router is an extremely difficult task. It is not as "simple" as it would be on a desktop PC, solely because the kernel has to be built to be extremely small and thus removes any kind of profiling support (and even crash/debugging symbols) due to limited flash on routers. Plus all the tools are missing (Entware might provide such).

The best tool you have available is to describe your entire configuration in detail -- meaning write down every single configuration/change you make to the router compared to stock defaults -- and provide those here, combined with using top -d 1 and look at the X.XX%si (soft IRQ) field. I will not describe soft IRQ here; it's a Linux kernel thing. You can search the forum of my posts on the topic before. It has come up more than once in the past.

I recommend this to anyone: if you're planning on doing things that are expected from business-grade products (ex. VLANs), especially if Xen and VMs are involved (it almost sounds like you're creating a company/business of some sort), do yourself a favour and buy yourself a managed switch (ex. HP ProCurve; I only mention them because I've used VLANs on them before and they work flawlessly). Don't bother with consumer-grade routers; they are not workhorses. They are intended for simple/common residential use.

Ye... I saw some of my info was erronous.. anyway to continue; I realise(d) that there can be no network activity and the throughput drop happens, so it is not cpu-bound, (but could be that on top of it of course).

Thanks for the tip on -d1 will check it and see if I can produce some useful debugging info. It is a home office yes.. the xen servers are due to one server running non opensource ware, which, despite being made by a hugh company, was hacked in no time. Thus, I do not want such low grade servers on my systems.

I had to virtualise it, harden it and voila, better now.. Have not seen the buffer overflow attack since. Anyway, going back .. so Xen does a great job of handling vlans on a linux box (although I could use pfSense ahead of shorewall/iptables). The annoyance is, it is that router which has my gigabit LAN + wifi chip. Now, had that vlan issue on the router not happened, (and I am happy to create a wifi which isn't bridged to one of the vlans I guess if need be). I would have just put the LAN and Xen on two vlans.

Sorry, I am hungover and I think I am just rambling needlessly. Ok, but I have verified, by flashing stock shibby (or viktek plus compiling locally as aforemenitoned), and then *only* setting a port on a 2nd vlan (and bridging it to a second bridge) and voila.. it happens.

Drops from 930 to 125 (people always thin k it is [the minor] overhead, but it is actually the marketing proper Mega vs/M byte, i.e 1.024^-N, where N=1..2..3..K,M,G).

I don't have the chance to flash it atm, but will see if I can do some testing off hours and get back to you. Thanks.
I got some hope now actually, there might be a solution.

heya koitsu.. well, with disabling the wifi to bridges just to keep that away does nothing.. and even on stock, the same effect happens the second I put a port on a different vlan (say vlan3). But indeed, you are right.. cpu, everything is still low load but sirq 100% (or 99 but ye).

I will look through for your posts regarding this and see what can be done. The low throughput happens even on same vlan so something with those sirqs indeed

Addendum: Right, so I think I came across this earlier when compiling (which was some weeks ago now).
cat /proc/sys/net/ipv4/netfilter/ip_conntrack_fastnat
0

Seems as Broadcom gets in trouble due to fastnat not enabled or so?

guess I will have to test with compiling a fastnat alternative and see if I can spot any differences.. (and if the end result is a working solution for me since fastnat doesn't track connections).

Addendum2: enabling it , gave throughput up 50mbits to 180ish, still not good, and still shot sirq to 100. I am guessing without open source to the Broadcom drivers, vlans on the WNR3500Lv2 at least is just a pile of crap if one intended to have the GB speeds. :/ So in effect, my lovely tomato has in effect become just a simple fw and WAP for the lan sigh.

I can't remember if the stock fw has vlan options, but since they leave 2 ports open (equally easy for netgear and/or NSA to backdrop into one's private systems.. means that will never be an option).

OK.. well looking into it, it is simply the device driver.. ye... oh wooo at Broadcom .. this router was even 'open source certified' or so it claimed haha... either way, splitting up the switch logically, shoudl have dropped it down to at least 400+ mb/s but instead as koitsu pointed out, the network driver throws out a bunch of fits (sw interrupts) and it goes all tits up for the soft irq daemon to deal with them. It simply doesn't have the juice or this model would need to be test compiled with lots of varyinf config options, to be able to make the driver not spew out so much. Until then, the WNR3500Lv2 can't be considered a vlannable router. Even at 100mb/s the sirq's surely mean you will lag the router out so.

Ok... I have decided to not entirely give up *yet*. Having played with dd-wrt to only realise, not only is our gui nicer and more updated (I tried because the dd-wrt version had an alternate kernel version), but it is more consistent too.

Anyway, I did come up with some ideas, since on dd-wrt, I got the same issues , however a trivial? difference of 100% load on the nic ahead of sirq.. again I think this was really the same thing in two masks.

BUT... I now am trying to add vlans unbridged... stand alone, and not on the bridges... (so ignoring brctl, stp etc for now)... although been testing that too... I wish to see if I can forward on layer 2 and layer 3, with iptables and possibly ebtables? Any input would help (especially fom guru's like you koitu say). I always mess myself up with bridegs, and vlans, and layer 2 and 3, and whatnot, netfilter, nf filter, {eb,ip}tables. Had only we had a strong ES, like *bsd but ok.

To answer some... the wnr3500Lv2 does seem to work ok with tags (although haven't tusted any trunking), when all ports are moved away from such apeland things as vlan 0 (I also skip vlan 1 to be consistent).

Even if I don't place the vlans on the bridges, I need iptables to open comms, which I guess places it on the level I do not wish atm. Hence, possibly try ebtables? Please, I am sure what I say is inconsistent and wrong. I admitedly am no network engineer-slash-guru by nature and I always get confused by it.. but* I tend to always reach my goal, with enough fiddling. This router though has been driving me up the walls

Addendum:

I am thinking (sw) bridges put the routing onto the cpu, rather than intenral switch, thus wishing to avoid them? Also, wondering if static routes could be worth trying out.