Posts by Max 6

Due Process is Hard, Let's Go Shopping

Insufficient Evidence typically means there is, and that the victim has, provided exculpatory statements to the police about the nature of the offense. This is why we have due process. If for example, she'll be proven a liar or worse on the stand, then of course the authorities are going to have a hard time pressing charges.

Pictures of people having group sex with a girl is simply pictures of people having group sex with a girl. Was it consensual at the time of the act? What was her level of intoxication at the time of the act? Unless it's video, you can't really tell if someone is passed out or not.

Looks like NetApp has been right all these years..

This is a smart and long overdue move by EMC. From where I sit, the mid range mainstream arrays are the sweet spot for most customers, and it's actually a tricky problem to get right as most vendors forget the NAS part :)

IP only? Or does this include transport...

I know for a fact that Google's transport network relies on third party transport (DWDM / SONET) compared to what they actually "own" or light up directly (dark fiber) If this is simply comparing the amount of IP traffic on one network vs. another, it's easy to see why Google ranks high if you understand what they do.

1) Crawl the entire WWW and other protocols

2) Crunch and Reduce said content

3) Distribute across multiple data centers.

That's besides all the youtube stuff they do and the amount of mail that goes through services like Postini.

Re: Don't de-dupe your primary! && moar FUD

AC:

Dedupe isn't for every data set, but it works great for many of them. vmdks in particular, and other types of big ass files you can see up to 75% savings in some cases add block compression and you can squeeze out some impressive space savings. As for the performance penalty, it's just CPU, which is cheaper then memory. Since it runs as a low priority background process, it affects performance as much as disk scrubs do (they don't FWIW).

This means you can actually get more efficiency out of things like SSD, improving overall performance as deduped data sets fit better in memory. As you point out, there is a constant discrepancy between "Extra" space due to drive capacity and IOPS requirements. Dedupe + PAM (or just more memory netapp plz kthnx) acts to balance this tradeoff out to achieve a balance between the IOPS and capacity requirements.

Geoff, Texas Todd, 3par and Compellent ....

You guys are awesome block heads (Not meant as an offence, just pointing out that you only do block protocols) but you're missing a couple things that are different in the HPC and to a certain extent the "cloud" space.

1) NFS 4.1 means the "clients" can do the tiering. They can frankly do it better than any array can. Heck, you can't get clever with the automounter and NFS v3 to achieve similar results. This magic block tiering strategy is a head scratcher for these environments.

2) I wouldn't describe any FC block based system as massively parallel. Topping out at 8 or 16 controllers in not massively anything. I would argue that both compellent and 3par products are impressive, but not *MASSIVE*. NTAP, ibrix, and Isilon are larger, but aren't *MASSIVE* either. Well, maybe Ibrix but I digress.

3) Most Virtual environments, even Netapp ones keep virtual machines on different "tiers" of WAFL (a higher level abstraction than "raid group" or "disk"), and cache on cache architectures are much more effective for this than having the array decide which blocks are hot for random read workloads. Since Netapp arrays don't really have much in terms of write performance issues, this auto tiering doesn't do much for this corner case that might exist in a block environment.

For Netapp tiering just get FC or SAS and configure those as caching volumes. (you can do this in the same controller, or an outboard cache)

Next, for a VMDK farm of 8,000 VMs that live on SATA simply pre populate the cache tier by let it warm up naturally or if you're in a hurry run ls -la across the directory tree.

Let PAM handle the rest.

2/3rds of the above technologies have been around since 2002 or prior, adding PAM just makes it go faster and adds an additional "tier". Granted, this assumes you're running vmware over NFS, but it makes more sense to me than having 25 different "tiers" of raid types and RPMs

NetApp is doing "tiering" and scale out, but dedupe and other features are more important.

Seems netapp is keen on the 8.x roadmap, which is pretty heavy on both scale out and tiering features... I think the difference is that they are doing so without having 20 different levels of drive technology in the mix.

I don't necessarily see huge improvements with write performance for SSD vs. SAS/SATA or FC, as write performance is generally not an issue, which is the primary reason SSD adoption is low. Using PCIE cache controllers addresses more important performance areas, esp. since NetApp's controllers are a little slim on the memory (HINT NTAP, HINT)

As For scale out, I'm guessing most customers aren't running a renderfarm or HPC environment; GX is and has been available to meet these customers needs for some time. Also, The Massively Parallel HDFS/GFS type web architectures require a different product strategy and interfaces than NFS/CIFS/FCP/ISCSI type storage devices.

I find it ironic that while some vendors are forcing ZOMG SCALE OUT! and unnecessary tiering sizzle down customers throats they are ignoring the pressure in most shops these days to *Reduce* the amount of boxes and complexity in their environments, This means fewer disks and fewer controllers (esp. disks).

I'm not saying that scale out doesn't have it's place (the Ibrix guys work for NetApp IIRC) but it's worth less to more folks than technologies like Deduplication and balancing out IOPS/capacity tradeoffs with PAM.