The data is being copied across the network between volumes on the RAID5 array of each server so I/O should not really be a limiting factor - both hit a synthetic benchmark of around 225MB/s. Connectivity between the two is a single unmanaged SMB gigabit switch.

These are not purely a single or a few sequential reads and writes but they are not all random either. It's image-based backup software so it's transferring the data a couple chunks at a time.

I'm averaging a network utilization of about 425Mbps = 53MB/s. Is this inline with expectations taking into account overhead introduced by the file system and network stack or should I be seeing numbers closer to 60-70%? iperf between the machines gives around 780Mbps.

780 for an extended test is not bad. It could be better and may be limited by either CPU/memory performance of the server, or if you already have other network traffic going on on those servers, or even by some weird driver tweak or network config. In reality though, for an extended throughput test, it will probably settle around 850 megabit or maybe 900 if you are lucky.

The backup doing its transfer in small pieces is the likely issue. A large single file transfer will always be faster than many smaller transfers since each transfer is a new connection session and has setup and teardown overhead, authentication (if applicable), encryption negotiation (if applicable), plus the raw differences in writing a large file through the OS and hardware versus writing many smaller files. Unfortunately, if that is the way the software works, there is probably not much you can do about it unless it has options for 'pipelining' or using multiple sessions at the same time. That would increase CPU and memory use on the client and server but might give almost a 100% increase in performance.

That's about what I got from a 2950. Have the RAID 5 arrays got write caching enabled?

The next generation Nehalem based Dell servers will saturate gig comfortably. You've presumably got SP2 on the 2K8 host and got utterly up to date network drivers/BIOS/NIC firmware everywhere?

If this is a routine task and you want more performance then jumbo frames may be worth trying, and if the SMB switch doesn't support them, you may want to consider a second directly cabled connection between the two hosts. You've got 4 ports on each host so you may well be able to dedicate one.

That's about what I got from a 2950. Have the RAID 5 arrays got write caching enabled?

Yes they do now. We had one the batteries go bad on one of the servers and their on-site IT guys had no idea why performance had dropped off.

Quote:

If this is a routine task and you want more performance then jumbo frames may be worth trying, and if the SMB switch doesn't support them, you may want to consider a second directly cabled connection between the two hosts. You've got 4 ports on each host so you may well be able to dedicate one.

Well it's a Gen II so we only have two on-board NICs but I know what you mean. There are a total of 3 servers involved (2 being backed up and a management server/file vault) so direct connection isn't really an option - but I could always setup a secondary network using the secondary on-board NIC and tie them together with a Dell 2808 or something similar that supports jumbo frames. I've done this before with about 5 servers using Backup Exec at another job. I'll have to figure out if it's worth the time.

The backup window is pretty decent so time shouldn't be an issue, I just wanted to make sure things were performing as they should be.

These are not purely a single or a few sequential reads and writes but they are not all random either. (...)I'm averaging a network utilization of about 425Mbps = 53MB/s.

You might try a simple large sequential transfer (10-20GB - to ensure you aren't hitting RAM caching) and see what kind of speeds you can get. When you throw random file access in the mix then your limit typically isn't the network anymore but somewhere else. For a large sequential file access on gigabit I would expect from 95-115MB/sec. -- but I would also expect iPerf score of 900-975 or so.

When copying files between two Windows machines, what is the average real world performance throughput you see on a gigabit network taking overhead into account?

iperf between the machines gives around 780Mbps.

Note: I don't usually do windows. I get 890Mb/s+ on desktops and laptops with iperf, back to back or non-block switch fabric. Usually Macs and Linux boxes to a VM on a ancient dell server (probably the limiting factor). Just did 897Mb/s to the VM on the ancient dell server from a macbook pro plugged in through two switch hops, between the switches only has a 1 gig uplink and 20 people on the edge switch, APs all over the place- pretty much non-optimal.

So I disagree that 780Mb/s is good for an iperf for server gear with one switch. I think there is room for improvement- even the desktop/vm test sucks at least compared with what I get in the lab with modern hardware back to back (990+). Anything from the last 5 years should be able to iperf max a gig connection.

How are you getting the network utilization? What granularity is that at? If this is a 5 min average, you might be getting the full bandwidth and suffering at the hands of CIFS and RAID5.

Real world throughput on RAID 5 with SMBv2? You are probably ballpark- 480Mb/s is what I've seen from some servers. I asked about the granularity because I've found CIFS even SMBv2 to be terribly bursty and poor at maintaining throughput over time, but SMBv2 can do better than that on gig. . RAID5 also has a similar problem depending on specific implementation and caching. It is a nice reason to move to a storage platform like NetApp. You can usually max out gig from a NetApp, even with SMB or SMBv2.

53 megabytes/sec is as good as it will get. There's nothing wrong with your setup at all.RAID5 isn't the speediest RAID level. And realistically, getting drives to read/write faster than about 50 mb/sec is REALLY hard to do.

Should double that easily with any moderately current drive - Here's a sustained write to a WD 6400 AAKS - a definitley not current 7200 640GB drive. You can see it sustaines 105 MB writes without issue (Local Area Connection 4 is the nas connection - so it's not just an initial cache spike). (The point here being that the D (destination) drive is just a single old drive and still can supply 100 MB reads).

(Note that this is all for large sequential files, as random IO is a different issue)

Indeed. 100MB/s is easily attained by almost any modern HD. In RAID 5 it's even easier. In my R5 of WD greens I get well over 700MB/s for large files and 117MB/s over LAN without jumbo frames and with handmade CAT6 cables. Simple environment too.

Alright, I have some more information and there are a couple gems in here.

First off, drive performance of the RAID5 array with the same basic configuration on the three servers for an I/O baseline:(The OS is on a RAID1 array but most data exists on the RAID5 array)

Spoiler: show

Kinda strange that the SATA RAID5 beats the SAS 10K RAID5 with the same number of spindles. Whatever.

The gist of this is that two servers are supposed to backup to a single backup server. Three servers (SHAREPOINT/PARKSERV/BACKUP), pretty much equal hardware (PE2950's Gen II) except the BACKUP server has SATA instead of 10K SAS drives. A gigabit NICs connected to the same unmanaged switch. Pretty straightforward.

iPerf performance between BACKUP (server) and SHAREPOINT (client):

Spoiler: show

I'd say 513Mbps is pretty paltry given the I/O.

Now we reach the pièce de résistance; prepare to have your minds blown.

Oh Snap!.....and the iPerf benchmark is backed up by the ridiculously slow Windows file copy.

iPerf performance between SHAREPOINT (server) and PARKSERV (client):

Spoiler: show

Ok, so we get 132Mbps. Wait, what?

A couple things:

1) WTF2) Any theories on why numbers are so different between two servers when only the client and server role are swapped?3) Any theories on why I'm seeing 5-freakin' Mbps?4) I assume the traffic from iPerf is written directly to RAM so I/O of the RAID1 vs RAID5 arrays would not make a difference. Is this correct or is the performance of my RAID1 or RAID5 array affecting performance in any way.

Any additional insights? I'll keep digging into this, but I wanted some thoughts before I broke out Wireshark. Thanks.

drastically different speeds in sending vs receiving can be a duplex mismatch; make sure your server's NIC settings match that of the switch port they are connected to (either speed and duplex set to auto on BOTH sides, or BOTH sides hard-set to 1000/full).

If directionality of transfer is a problem- check speed and duplex at both ends. Windows having problems with iperf? Check drivers and physical layer (cabling etc) on iperf challenged hosts. Do other hosts have the problem? Standard troubleshooting tactics apply.

Also check transport and backplane. Are there other congestion points? Uplinks between switches? Etc?

With recent desktop hardware back to back, good cables- I can get 990Mb/s and change with iperf. You should at least be in the 890+ even with sub-optimal hardware (the vm server in my iperf is a 5 or 6 year old Dell with Opterons and is using ESXi 4.x which also isn't as good at I/O as ESXi 5).

Given that neither client to server connection seems to be working properly you either need to isolate the machines and eliminate issues on each machine directly, or look for commonalities that could cause the problem (like network cabling or the switches).

If I were you, I would start with making sure you have consistent speed and duplex settings and actual interface status documented for each device on the switches in question. If you have unmanaged switches and the machines are not set to auto negotiate speed and duplex you're going to have problems like this.

Then make sure drivers for the network interfaces updated just to eliminate that. It's easy enough to start there.

If that doesn't help you can go to doing direct cable between machines to eliminate actual local problems on the machines themselves (like some weird TCP stack tuning long since forgotten, failing NIC or whatever).

It might also help to take 2 other machines unrelated to this situation and plug them into the same switches in use here and see if they can do better with iperf.

Sorry, guys - I'll try to put some more information together soon but I've been away for awhile. I did check the servers are all set to auto-neg along with the clients. Both sides are negotiating 1000/Full

The NIC drivers in the horribly problematic server (PARKSERV) are up-to-date by Windows Update so they are the latest, but not necessarily the best - not sure if that's good enough or if I should download them from Broadcoms site. They are all on-board Broadcoms and I have dealt with some finicky ones in the past, but not like this.

I'll do some more research later tonight and hopefully have more tomorrow. I almost wish the problem was only between client and server. The servers are on their own switch and most of the clients are on another. In that case the uplink between the two switches could be the problem; however PARKSERV and BACKUP are on the same switch. I think. I will have to talk to their IT people and get physical access to verify how it's setup.

Between my new rig i7 2700 with 16gb ram onboard Realtek and a Samsung ecodrive to my storage box with 8gb ram and a raid 5 I am now pushing about 810-840mbps and spiking to about 890mbps when moving large files. I seldom got more than about 650mbps with my Core duo rig and 4gb ram. I was surprised when I saw windows reporting 89% network utilization and 105MB/s during transfers.

Of course it is less impressive since I've spent a lot of time with 10gbps switches at work.