It won't generate one unified number that represents everything at the end but if you're serious about understanding storage performance you will know that single number cannot explain all of what you need to know. Even Linus Torvalds thinks fio is good:

First, I would suggest you using a
more accurate and controllable tool to
test performance. hdparm was designed
to change IDE device parameters and
the test it does is quite basic. You
can't also tell what's is going on
when using hdparm on compound devices
on LVM and iSCSI. Also, hdparm does
not test write speed, which is not
related with read speed as there are
different optimizations for both
(write back caches, read ahead and
prefetching algorithms, etc).

I prefer to use the old&good dd
command which allows you to fine
control block sizes, length of tests
and use of the buffer-cache. It also
gives you a nice and short report on
transfer rate. You can also choose to
test buffer-cache performance.

Also, do realize that there are
several layers involved here,
including the filesystem. hdparm only
tests access to the RAW device.

TEST COMMANDS I suggest using the
following commands for tests:

a) For raw devices, partitions, LVM
volumes, software RAIDs, iSCSI LUNs
(initiator side). Block size of 1M is
OK to test bulk transfer speed for
most modern devices. For TPS tests,
please use small sizes like 4k. Change
count to make a more realistic test (I
suggest long test to test sustained
rate against transitory
interferences). "odirect" flag avoids
using buffer-cache, so the test
results should be repeatable.

The WRITE test is DESTRUCTIVE!!!!!!
You should do it BEFORE CREATING
FILESYSTEM ON THE DEVICE!!!! On raw
devices, beware that the partition
table will be erased. You should force
the kernel to reread the partition
table on that case to avoid problems
(with fdisk). However, performance on
the whole device and on a single
partition should be the same.

There should be a large performance
gap between the layer being
responsible for the loss and the
following layer. But I don't think
this is LVM. I suspect of the
filesystem layer.

Now some tips for possible problems:

a) You didn't describe if you defined
a stripped LVM volume on iSCSI LUNs.
Stripping could create a bottleneck if
synchronous writing were used on iSCSI
targets (see issue with atime below).
Remember that default iSCSI target
behaviour is synchronous write (no RAM
caching). b) You didn't describe the
kind of access pattern to your files:
-Long sequential transfers of large amounts of data (100s of MB)?
-Sequences of small block random accesses?
-Many small files?

I may be wrong, but I suspect that
your system could be suffering the
effects of the "ATIME" issue. The
"atime" issue is a consequence of
"original ideas about Linux kernel
design", which we suffer in the last
years because of people eager to
participate in the design of an OS
which is not familiar with performance
and implications of design decisions.

Just in a few words. For almost 40
years, UNIX has updated the "last
access time" of an inode each time a
single read/write operation is done on
its file. The buffer cache holds data
updates which don't propagate to disk
for a while. However, in Linux design,
each update to inode's ATIME has to be
updated SYNCHRONOUSLY AND INMEDIATELY
to disk. Just realize the implications
of interleaving sync. transfers in a
stream of operations on top of iSCSI
protocol.

To check if this applies, just do this
test:
-Read a long file (at least 30 seconds) without using the cache. Of
course with dd!!!
-At the same time, monitor the I/O with "iostat -k 5".

If you observe a small, but continuous
flow of write operations while reading
data, it could be the inode updates.

Solution: The thing is becoming so
weird with Linux that they have added
a mount option to some filesystems
(XFS, EXT3, etc) to disable the update
of atime. Of course that makes
filesystems semantics different from
the POSIX standards. Some applications
observing last access time of files
could fail (mostly email readers and
servers like pine, elm, Cyrus, etc).
Just remount your file system with
options "noatime,nodiratime". There is
also a "norelatime" on recent
distributions which reduces
obsolescence in "atime" for inodes.

Please, drop a note about results of
these tests and the result of your
investigation.

It depends on the purpose of the disk. The fastest and the simplest way is "dd" as tmow mentioned, but I would additionaly recommend iozone and orion.

IOzone in my opinion is more precise in filesystem benchmarking than bonnie++

Orion ("ORacle IO Numbers" from Oracle) is very scalable and can benchmark properly even very large/powerful storage, and I find it very useful in scaling storage for databases. (I collect results of orion from different disk arrays, disk controllers and raid configurations, and then compere them)

Using SSH add's a CPU-bound component, so you can develop a bottleneck when testing on slow CPUs. Use netcat/socat instead and then you have next to no overhead.
– MarcinJan 7 '11 at 14:22

Like the nice simple dd command - thats all I was after!
– CodekMay 16 '12 at 10:26

2

Two problems here: Unless your machine has less than 100MB of free memory, your dd command will be simply writing straight to cache held in RAM. fsync/fdatasync/dsync options will ensure the data is written to disk but it's still a very rough estimate and won't report IOPS as requested by the OP - bonnie++ will give better results. Secondly, As @Marcin stated, ssh will introduce a CPU bottleneck and an overhead that you're not measuring. Results will differ dramatically in each environment.
– Alastair McCormackSep 28 '13 at 20:21

SSH is really not a benchmark tool. Value returned is not really significative of something.
– ZuluSep 20 '14 at 10:13