de-dupe

July 14, 2010

I wrote about the fact that we already had zero detect technology in our product, which is useful for the new Full Copy command because it allows customers to remove zeroed data from clones when they are created and return them to array free space.

The discussion became a bit confused when Chad interpreted what I was saying as pertaining to Block Zeroing.

Block Zeroing and Full Copy are different aspect of the VAAI API. The intent of block zeroing is to reduce the amount of CPU effort and storage traffic required to write zeroes across an entire EagerZeroThick (EZT) VMDK when it is created. The intent of Full Copy is to make clones of VMs quickly without consuming I/O bandwidth. Things get interesting when you start thinking about making a full copy of an EZT VMDK that was created using VAAI with block zeroing - but I'll discuss that later.

I also want to clarify what zero detection technology is. 3PAR T and F class arrays have zero detection technology, which is enabled by Thin Persistence software, that recognizes zeroed blocks as they are read by the array and returns them to the array's free pool. Any read requests made to these block addresses will return a zero value. In essence it is dedupe for zeroes.

However, Zero detection is not needed when an EZT VMDK is created using the VAAI plug-in because the array will recognize the intent of the command and not write the zeroes. In other words, the VMDK will only contain a very small amount of reserved space when it is created. Again, any attempts to read blocks in those ranges will return zero values. Zero detection is effectively bypassed during the creation of the EZT VMDK.

The exception to this behavior is when the EZT VMDK being created is written to a thick volume - in that case the array will write zeroes across the entire VMDK.

The remaining cases for the creation of EZT VMDKs on 3PAR arrays occur when the VAAI is not used. For a thick volume, the entire VMDK has zeroes written to it. Thin volumes not using zero detect also have zeroes written over the entire VMDK. Thin volumes with zero detect will not have zeroes written to them and will contain only a small amount of reserved space.

FWIW, the reserved space is used as instantly-available capacity that can be allocated on-demand when writes start coming into the volume. 3PAR arrays always "read ahead" free space to improve the performance of thin provisioning.

The next bit here could be a bit thorny, so clear your head. The matter of making a Full Copy of an EZT VMDK to a thinly provisioned volume was something Chad said was not allowed. My assumption here is that the type of thin provisioning used makes a big difference.

For instance, if you are using TP from VMware, I could see where they would not allow a full copy to be made. The problem is that the full copy will return all the zero values for the source VMDK, whether or not those zeroes were ever actually written - and write them to the target TP volume. In other words, the target could be much larger than the source. In the VMware TP scheme, this could make for problems in a hurry if you were making a bunch of clones this way.

In contrast, if you were using a 3PAR array with zero detection, the Full Copy of the source VMDK would return zeroes for the entire VMDK, but the zero detection would strip them out again as the target was being written. You could make as many clones as you wanted this way, knowing that the physical capacity they consume would be a multiple of the physical capacity consumed by the source VMDK. In other words, you wouldn't have to worry about virtual zero bloat making a mess of your VMFS volume.

One of the big differences between 3PAR's zero detection technology and other vendors zero-reclaim technology is that 3PAR's process is real-time-on-ingestion as data comes into the array, whereas zero-reclaim works in a post processing fashion after the zeroes have already consumed disk space. This could be a significant difference in many cases because the post-processing method has the potential to create unexpected capacity-full conditions before the zero-reclamation process even has a chance to start.

OMG! I wasn't sure who to call first, my agent, the paparazzi or Dr. Phil to talk this over with. Like, this was SOOOO predictable - like, you know - two people that start out as adversaries become attracted to each other out of financial necessity and then they find out, like, they don't really have that much in common, except that, like, they sort of need each other to use as an excuse when they are, like trying to get out of other situations that they don't, like, you know, like very much?

Will this relationship become the longest running technology soap opera of all time? The rumor mill has been pretty quiet lately concerning Dell's plans to acquire technology companies and create more leverage and generate higher margins. So how is a new deal with an old flame going to help them do that? I can only guess that the EqualLogic business at Dell has not panned out as well for Dell as they have been saying. Perhaps storage investogators Chris Mellor, Beth Pariseau or Simon Sharwood can dig up something about this?

As it is with so many sales relationships, it will only last as long as the tricks that the respective sales forces play on each other don't get too nasty. Somebody is going to make more money than the other and the one with the smaller surprise won't be very happy when they find out their party is a bring your own event.

Good luck to both for having done something they will probably both regret, but never admit to having any.

July 09, 2009

Chris Fricke commented here, wondering what's in store for existing DDUP customers. Good question Chris, who gives a rip about you? Would it matter if you had EMC gear already, as opposed to being a card carrying EqualLogic customer? If you have anything to say about what happens to customers, let fly!

July 08, 2009

Will EMC pay too much for DDUP? That's the question everybody is going to be asking for years. It's an awful lot of money, but de duplication of backup and archive data is probably going to be a very big deal for years to come.

I like Steve Duplessie's take on it: he said Netapp should feel relieved because they would have been under immense pressure. EMC, on the other hand, has a larger business and more ways to leverage their DDUP investment - such as their channel relationship with Dell. Of course, that would mean that Dell would have to agree to be even more dependent on EMC for their storage business.

Netapp gets a consolation prize - the $57 Million break up fee which will be paid by Data Domain. It's not a lot but I'm sure they can put it to good use. The larger problem for Netapp is how they are going to replace DDUP in their plans. Obviously there were things they wanted to do with DDUP that they aren't going to be able to do now. It will be interesting to see what their next move will be. There are other de duplication companies available if they want to grow their business that way.

June 18, 2009

Digital archivists struggle with the question of whether there will be equipment available to read data from media that was many years before. It's not a simple matter of keeping devices for decades because all interfaces, including things like computer I/O buses, network interfaces and even power cables need to be preserved.

Carter George at Ocarina has a good post today discussing what some of the implications of dedupe technology are for archival storage. He touches on software issues, metadata and legal topics that need to be considered. If you have an interest in using dedupe for long term archival you probably need to be paying attention to what Carter is talking about here.

You know you're hot when different government legal organizations are vying for position. What's next? A ruling and then lawsuits, or just a cascade of lawsuits in all directions? Whatever, this thing is going to go down as easy as the bull testicles on "get me out of here".(begins at the 5 minute mark in the video)

At this point EMC is screwed if they try to raise their offer as some have suggested if this deal comes down to a ruling about anti-trust. I have to say, the Netapp legal team seems to be doing a very good job of positioning what many observers believe is an inferior offer to EMC's cash-out option.

Then there is the VC angle. A really big win for Greylock and NEA. For everybody wondering what all those crazy VCs were thinking when they invested all over the place in the storage industry 8-12 years ago - this is exactly what they were hoping for. It's good money when you can get it.

There's a lot of change happening in the storage industry, but one thing doesn't change much - the competition is fierce. 3PAR is very happy to have an active role in the storage industry, pushing the envelope and making technology followers out of companies like EMC and HDS. We have a lot of storage in cages inside the worlds largest cloud computing data centers.

(DDUP) "Yes papa, but the alternative isn't bad, he'll give you the 20 goats and because he's a beekeeper - we'll bring you sweet honey for the rest of your life! "

(stockholders) "You think I should take the promise of honey instead of a cow??!! Oy vey! I could do a lot with a cow."

(DDUP) "But if I go with him, you'll never see me again! It will be the death of me! Besides, what if old moneybags changes his mind? Nobody will want me - I could be a widow the rest of my life?"

(stockholders) "Don't worry my dear, there's always someone else. Besides, its not you rejecting your beekeeper man, its me, He'll still want you if he really loves you. But you know what they always say, a cow is a cow! Still, I hear the desperation in your voice and it breaks my heart. I need to think about this......"

June 12, 2009

3P got all excited this week when Joe Tucci from EMC made his public plea to Data Domain's employees in the San Jose Mercury News because he wanted to tell them something too. He didn't have the budget to spend on a full page in the paper, but he has other ways to get his message across - like this Rap Blog.

Thank goodness, the arcane world of storage is too obscure for most normal people. Even the word "dedupe" appears to be a tongue tying transgression waiting to happen. There might be some job security after all.

The Board of Directors of Data Domain (NASDAQ:DDUP) today commented on
the unsolicited offer it has received from EMC Corporation (NYSE:EMC) to
acquire all of the outstanding shares of Data Domain common stock for
$30.00 per share in cash. Consistent with its fiduciary duties and in
consultation with its financial and legal advisors, Data Domain’s Board
is reviewing EMC’s offer. At this time, the Board is not making a
recommendation with respect to the EMC offer. Data Domain requests that
its stockholders defer making a determination whether to accept or
reject EMC’s offer until Data Domain has communicated to stockholders
its position regarding the tender offer from EMC. In accordance with
Rule 14d-9 of the Securities Exchange Act of 1934, on or before June 16,
2009, Data Domain will communicate to stockholders its position
regarding the tender offer from EMC. At this time, the Board is
reaffirming the recommendation in favor of Data Domain’s merger with
NetApp, Inc. (NASDAQ:NTAP) that is described in the Registration
Statement on Form S-4 that NetApp has filed with the Securities Exchange
Commission.

In this short video Cartoon Curtis Preston makes his debut with the League of Suspicious Avatars (LOSA):

June 03, 2009

OK - it looks like they really want to work for kind, friendly local Netapp people instead of joining hostile rough and tumble hoard at EMC .

However, there is the matter of shareholder approval, which is likely to be thorny because the institutional investors who hold a lot of DDUP's stock might not be all that sympathetic to the employee shareholders' desires. Cash speaks to finance people and they are measured on their investment return and not DDUP employee job satisfaction.

The twittersphere has been abuzz about it, with most opinions (based on an informal and unscientific survey) indicating some disatisfaction or disappointment that Netapp's bid was not higher. The twit concensus is that Netapp's counter offer was weak.

I don't interpret it that way. Netapp's counter looks more like an offer-matching coupon that covers not only EMC's offer, but any that Dell might make as well - and possibly more.

June 02, 2009

The tweetosphere has been lively with chatter about Data Domain, Netapp and EMC. Some of these discussions have suggested that EMC's hostile takeover attempt for Data Domain is intended to financially hurt Netapp. Its been suggested that Netapp is now in a no-win situation: If they lose Data Domain they will have a difficult time competing against the product as sold by EMC. If they outbid EMC, they will hurt themselves financially by stretching to make the deal.

What happens when companies are put between a rock and a hard place? Sometimes the underdog goes to the courts seeking an injunction in order to stop too much damage from being inflicted too fast. One of the gambits is to bring an antitrust suit against the aggressor - which I'm sure is being discussed within Netapp's (and probably EMC's) legal circles.

If this happens, the big loser in all of this ironically could be Data Domain. Confusion and uncertainty in the storage business is a bad thing as customers tend to put off purchases until they know where and how products will be supported in the future. That could be hard on Data Domain's business which could impact its valuation - with the cascading legal measures from DDUP this time intended to address whatever wrongs could be assigned. It would be freaking brutal, but that's the storage business for you. Brutal, cut throat. Dog eat dog. You need to know who you are dealing with in this business.

June 01, 2009

This industry just keeps getting more interesting all the time. Today, EMC offered to acquire Data Domain in an all cash offer, beating Netapp's proposal by 20%.

Notes from the EMC call - posted live - during the call. (My comments are in Red and Italics)

Joe Tucci trying to explain how deduplication works. Not very smooth. He sounded like he was trying to say there were advantages of combining both Source and Target dedupe.

Its not likely that there would be any advantage from deduping data that has already been deduped.

Welcome Data Domain Employees! Here's your 5% paycut!

* * * * * *

First question: Why now? Isn't this expensive?

Tucci says EMC wants to own both Target and Source dedupe. Figures the company that can do both and integrate them will have an advantage.

I'd say that tech-wise this probably doesn't make sense, but market-wise it probably does. He didn't really answer the questions.

* * * * * *

Q2: what about Avamar - how does this reflect on them?

Tucci: The use cases for DDUP and Avamar are highly complimentary.

Maybe this also says that there are customers that don't want Avamar, but want another solution instead.

* * * * **

missed 3rd Q & A

* * * * * *

Q4: Would the DL4000 would be end of life?

Tucci: We think the DL4000 would continue to sell and there is a new version coming.

Q4b: How would sales force integration work?

Tucci: Need DDUP sales force and their expertise. Lots of ex-EMC at DDUP. Familiar with the EMC way and culture.

Some probably wouldn't go back to EMC.

* * * * * *

Q5: Use cases - and will there be primary storage dedupe (I think that was the question)

Tucci: Thinks there will be a way to get primary dedupe - answer wandered a bit.

* * * * * *

Q6: Why didn't EMC make a play sooner? What about employee reaction?

Tucci: Our track record in acquisitions in better than Netapp's. "Rhetoric is rhetoric" Our track record in acquisitions speaks for itself.

That makes it very clear - this is hostile directed at Netapp

* * * * * *

Q7: Have you spoken to Slootman at DDUP recently?

Tucci: No, not recently, their acquisition agreement with Netapp precludes that

* * * * *

Q8: missed question.

Tucci: Sees a big opportunity - a 10 BIllion market. EMC could have built it, but time to market was more important.

I don't think EMC could have built it.

* * * * * *

Q9: was this offensive or defensive in regards to Netapp? (Don't want Netapp to have DDUP)

Tucci: This is an offensive move.

* * * * * *

Q10: How much overlap is there today between Avamar and DDUP?

Tucci: Some have chosen Avamar and others DDUP - more have chosen DDUP. Avamar has been one of our top 3 acquisitions

* * * * * *

Q11: What was the run rate for Avamar versus DL products? DDUP has had an outsourced model using outsourced vendors - will you continue this or utilize Clariion?

Tucci: DDUP makes a gateway as well as their whole product. EMC can use both.

* * * * * *

My Call Summary

While EMC says this is an offensive move, but it sure seems like a defensive move to me in order to to keep Netapp from cornering the dedupe market. The way these deals tend to go is that multiple companies are in "on the bidding" (in private) and that EMC and Netapp were both involved up until a little more than a week or so. Netapp probably prevailed and announced their "win" last week.

EMC now has had a chance to regroup after EMC World and now has loser's remorse - so bad that they are willing to pay an enormous premium for a company that was already offered a premium price by Netapp. Good for DDUP and more power to 'em! My guess is that EMC looked at their own dedupe products and realized that they could see their own products getting killed by the Netapp/DDUP combination. Apparently EMC views dedupe as strategic and they saw a failure in their strategy passing before their eyes. They really should have taken care of business earlier rather than playing chicken with Netapp and DDUP.