This is something very new. I have not had much time to explore this, but we have agreed to explore this much further. What these folks have is an appliance that sits in-band between the backup server and the back end tape device. They pass through the data to the device without changing it, but in the process they tear the backup stream apart and index the content of the data. In other words, they crack the packets open, index them, and put them back together again before sending them downstream to the tape device. They claim to work with NetBackup, Legato, and TSM (maybe others, but I don’t recall and won’t know for a little while). Once they have all of this indexed, it becomes searchable and “auditable” through their appliance. It’s an interesting concept, so I’ll make sure to explore this further with them. I’m concerned about scalability, index sizes (although they claim huge savings in this), and versioning issues (i.e. Legato changes OpenTape and they now become the gateway for an upgrade).

This is a new CDP product that has just come out of stealth mode. They work only in a windows world, but they appear to have a pretty comprehensive solution for that space. The way they work is by inserting a small driver (splinter driver) into the kernel that splits I/O’s between the “real” storage and their device. The I/O’s that come into their device are time stamped and cataloged. What’s really interesting is that they have agents that work with Exchange, SQL server, and the file system. They claim that with these agents it not necessary to bring the database to a consistent point in time to do full recoveries. They also have the ability to do single message or mailbox restores in Exchange from these continuous captures. In other words, there is no data loss. Interesting to say the least, but, again, I am interested in seeing the scalability and their roadmap. More to follow on these guys.

So we all know now about encryption on the host level (filesystem, application level, column level in databases, etc.). Most of us also know about the new encryption appliances that work at the block or file level protocols (SAN/NAS/iSCSI). The big players here are Decru and NeoScale. What all of these fail to do is set a finer level of granularity of control to who sees the data. What these tools do is in essence protect against unauthorized access from users that are not authenticated by the system. For example, if we are using a Decru appliance to encrypt disk data (block level), users on the SAN that gain the ability to map LUNs will not be able to gain access to the data even if they remap the LUN to another host. The only access is through the host that has the encryption policy permissions to see the LUN in cleartext. But, that’s the where the problem lies. Anyone with root level access on that server can see ALL of the data on that device. So, the way people protect against that today is by implementing a software layer of encryption. In essence, they do dual layer encryption. One to bulk protect against LUN level access and the other at something like the column level within the database so that key information is not visible to users with root/administrator level access to the system or database. This is where Vormetric comes in. Their appliance is a combination of a software driver and an appliance that gains a finer level of granularity of access while also encrypting the information on these systems. The best way to think about this tool set is as a way to give root and administrator level users access to only the data they need in other to do their job. Things like /etc directories in UNIX or the registry in Windows. However, the sensitive application data is completely encrypted from them. However, the right users, and even the right application would have full access to the information. So, the question is now, how does this scale, how does this tie into the bulk encryption guys, and how does this work in DR/backup/etc environments. Once again, a meeting has been set with these guys to figure this one out.

Speaking of security and encryption, here’s a thought: How do you prove that what was originally stored and what is stored now is the same content? Sure, you can encrypt it the way Decru and Vormetric does, but a sufficiently skilled or authorized user could change the content of the data. All that has happened is that the data is an encrypted format for non-authorized or non-skillfull attackers. How do you prove in litigation that you really are presenting the data that was originally there? Well, this little company thinks that they have an answer. They were not a presenter or even an exhibitor in the conference, but I happened to sit in a spot where they conveniently happen to migrate to. They were making their pitch to a Symantec person to see if they could include this in their technology. I was certainly intrigued and I suspect that this is going to become much, much more important shortly. Something to watch for.

These guys clearly had great visibility at this conference. They hosted an end-user session and had those end users clearly articulate that their message is loud and clear: simple, simple, simple. The toaster approach is working.

Copan
These folks are spending a whole lot of time re-doing their strategy. Their basic entry into the market was with their introduction of the MAID (Massive Array of Idle Disks) technology. Their basic concept is that for tier 3 storage (archival storage), there is a need for very low cost devices but with near instantaneous access. So, what they developed was a way to house a huge number of SATA disk drives (900+) in a single frame. With the current disk drive sizes, they have 3/4 of a Petabyte of storage in single rack! Their key insight was that most of this data will not be accessed so, there is little need to keep all of the drives spinning at the same time. They have some very sophisticated technology to figure out which drives are required to spin and which ones are not. Additionally, they have some disk management and exercise technology that allows them to spin up and verify disks and their long term viability. Their measured (and claimed) results from this is that the lifetime of SATA drives is expanded by a factor of four. This puts that drive technology in the ball part of reliability of the much more expensive SCSI drives. However, the cost of the drives, the cost of the power and cooling, and the cost of the management is much lower. Their initial introduction of this technology was as a VTL tape device. This didn’t work so well. The MAID stuff is cool, but so what? What’s really interesting is that they are now re-positioning themselves as a platform for long term storage technologies. They have divided their system into three levels of access. In my terms: 1) presentation/personality – SCSI/FCP, iSCSI, NFS/CIFS, VTL, etc., 2) API/Intelligence – a set of API tools that allow greater access (i.e. indexing, content aging, migration, protocol/api emulations). If and when this platform approach is deployed and a reality, this system becomes much more interesting. 750GB drives are out, 1TB drives are close, and soon even bigger drives will be available. So, if their platform is upgradable to take advantage of these higher densities and it’s also an open platform for storage, then this becomes a much more realistic thing. As with all of these, more questions remain and further investigation will need to be made.

Enough of that stuff. I did manage to attend some of the sessions today:

Enrique Salem Keynote

Consumer level threats

Consumer level technology has historically moved to the enterprise (gartner says that between 2007 and 2012 that the majority of technologies that enterprises adopt will come from consumer technologies – .8 probability in their words)

Consumers are loosing confidence in the online business model. Symantec is going to focus on increase the level of confidence

project voyager: proactive protection against phishing attacks,

Bit to do about project Genesis – the integration of the security and optimization tools on the desktop (norton tools)

netbackup 6.5 will have a puredisk gateway concept (netbackup will now have a puredisk storage unit) stange NBU backups from disk to puredisk for SIS/replication, use NBU to wrote recovery tapes for puredisk clients

unification of reporting (NOM will handle management of data protection, CC-Service will handle business of data protection) (cc-service is NBAR on steroids – optimized for trending,planning, analytics,, designed for outbound reporting (NOM is for administrator reporting), measuring costs, assess risk and exposure, verify compliance)

sso for disk (shared disk pool for all platforms) will also do de-dup and replicate – (allocation happens on a volume/volume basis – thing of each volume like we think of tape on SSO today) (will allow restores from a volume while other are writing to it – will leverage snapshots to do this)

SAN clients in a sso agent will move data through a media server to sso volumes

cdp – (application level you have to get a snapshot of the transaction logs) (fs you need to get a snapshot at a consistent point) (volume level – block level index store – must be mapped back to fs or application level in order to get a consistent state)

(cross server snapshot for horizontal application consistency is going to be considered – was a question from the audience)

in esx 3.0 (snapshot of a vm is mounted on another host, backup occurs on alternate host)

bmr is going to be integrated with Windows PE (preinstallation environment) (gives bmr the “livestate touch”) (boot winpe from cd & run in RAM for additional speed, no multiple reboot) BMR with WinPE available in Summer with 6.0

The presenter, Rick Huebsch, did not talk about NBU and Microsoft, so I approached him. His comment was “Vista and all that will be in 6.5.” He did acknowledge that his lack of MS material was conspicuous not because there wasn’t a future in it, but because he didn’t talk about it. I fully believe that all the “right” MS things will be done with NetBackup. But, we’ll keep an eye on that.

Some more thoughts:

The theme of the show was clearly focused around the Symantec core strengths. They did not minimize the importance of the Veritas enterprise products, but they sure did emphasize the end-user and mid-range products (think Norton product lines, think Backup-Exec). I’m not sure that this indicates a shift in priorities but it is clearly something to watch. The “feel” of this Vision was much different than last years. Last year’s keynotes were much more enterprise focused. This years’ spoke of enterprise, but from the aspect of Windows and Security. The storage elements of the Veritas product lines were not the centerpiece. I wonder if Symantec should not have different days or a different session where they speak of this technology. It’s probably me being old fashioned (in the way a 10 year old industry can be old fashioned), but the storage stuff is just as hard and it’s getting harder. The bulk of the customers I saw were coming to see this space, not the Super Norton 3000++. The partner show was almost exclusively storage centric. There were are few policy engine type of people, Intel, and Dell – but that’s it. Weird.

So, the first person to announce the right answer is….. parallels. Real virtualization for the Intel Macs. BootCamp is an answer, but it doesn’t let you do the real thing – keep the right os running while you jump to play with the not so good one. Now all I need is for the the 17″ dual chip / quad core MacBook Pro to come out. Then me, my bank account, and my mouse will fly to apple.com as fast as possible.

The much vaunted revamp of the Microsoft Office system includes a ton of new changes. One of the most important (as far as I can tell so far) is the complete revamp of the user interface. This link goes to a video where MS walks us through a high level overview of this change.

I’m excited about this, not for personal use, but because I might finally stop getting calls from everyone I know. Many of the features that make Word, Excel, and PowerPoint presentation look good are very difficult to figure out. The learning curve for all of these products is extreme, to say the least. To illustrate this, look at the size of this book. This 1172 page tome attempts to cover the features of this set of products. BUT, the Word only version is 912 pages by itself. Excel is 936 pages. No need to go on. What Office is missing is not features, but accessibility.

I hope that once we finally get our hands on this, the calls will stop (well actually, I expect a slew of calls when it first comes out because it has changed).

I ranted and raved before on Dvorak’s prediction. One of his big arguments was that Microsoft agreed to “only” a five year office extension. Well, I found this:

Listen to the RDF on this one. Not so much distortion.

One of the most interesting things about this is how Steve acted like a patient parent explaining to children (the audience) that we need to coexist in order to survive. I wonder how much of that feeling is still there. I’d imagine it’s quite a bit.

Recently, I had a customer ask for further clarification on a proposed storage assessment. They, wisely, had asked third parties (Gartner) to give them perspective on the value of doing a storage assessment. The third party, expensive, consultancy came back with four major areas that should be addressed:

Proper provisioning of storage

Maximize ROI by devising Data Lifecycle tiering strategy

Capacity planning for future purchases

Validate disaster recovery strategy and intra-company SLA’s

The customer, again wisely, asked us and the two other bidders to explain how our proposals would address the above. My response was very targeted, but had some insight that I think should be thrown to the aether. I’m also expanding it a bit since the original response did not address all of the points (they were out of scope for what we were trying to do).

So without further ado, here’s my thoughts on this:

1) Proper provisioning of storage

Gartner identifies this as an issue because most organization do not have a good understanding of what storage they have and how it is allocated. In addition, most organizations allocate storage as a “knee jerk” reaction to demand. By that, I mean that most allocation is done either by satisfying the customers requests (“I need 400GB of disk for my SQL database”) or by including storage in the acquisition of servers. These types of allocations do not consider the true cost of data management or even the true storage requirements. Provisioning is also typically looked as a one way function: storage allocation. However, there is a flip side to this: storage reclamation. As you well know, most users will over request storage because it’s easier to go to the well once. Very rarely, if ever, will they tell you “I asked for too much – you can take back 200GB.”

So, the first step in establishing a provisioning strategy is to understand what storage you have, how it’s allocated, and how well it’s being utilized. Once you have that understanding you can start making more informed strategic decisions on how your business should operate the storage infrastructure. With that in hand you can then start creating policies and procedures regarding your storage allocation and de-allocation. Only then will you be able to design a technology architecture to support your business requirements.

A good star for an assessment, internal or external, should give you: and understanding your current policies, procedures, and infrastructure. Additionally, it should make some broad recommendations as to the direction to take for your next step. However, determining a complete storage provisioning and management policy should be a project of it’s own right.

2) Maximizing ROI by devising Data Life cycle tiering strategy

Similar to point #1, the first step in understanding your data life cycle is to map your current storage. Any strategy needs to consider the results of #1 and do exactly that for both your unstructured and semi-structure data (files system, and email). An analysis of the data should give you the ammunition necessary for you to determine what tiering structure makes sense for you. Careful consideration should be given to the results to match them to industry best practices. However, those best practices should only be a guide as each business is different. The ultimate strategy will be a blend of best practices and targeted site specific practices.

3) Capacity planning for future purchases

This, again, ties to point #1. Capacity planning is part and parcel of a provisioning strategy. Because storage, systems, and growth in most companies varies drastically, a plan should be developed for the projected requirements for the subsequent 18 months. This will assist you in planning for the current, expect growth. However, as is the nature of any assessment like engagements, the recommendation are created only with data that identified during the duration of the engagement. If your business changes unexpectedly or grows faster than the projections created during the engagement, the recommendations will probably not be accurate. This is where you would need to have a capacity planning process that accommodates for changes. This process would, but it’s very nature, need to be something that is on-going and self monitoring. Typically, It is outside the scope of and assessment to device this capacity planning process. However, it is something that you should be able to device, albeit with some minor help, after this type of engagement.

4) Validate disaster recovery strategy and intra-company SLA’s.

Storage provisioning, allocation, and capacity planning is part of a properly maintained DR strategy. However, many companies fall into the trap of believing that a data protection or data replication plan is the DR plan. They neglect to consider the people and non-IT processes that are required to implement disaster recovery. While it’s true that these data based protection mechanism can help in the case of minor or even major disasters, a DR plan should be primarily based on managing the business processes in the case of an “event.” A good storage protection strategy would be used to accelerate the recovery process, but not be the recovery process. Any assessment engagement that addresses this element, should be focused on either how to implement a data protection methodology, or how the current or proposed protection systems map to the larger DR plan. The only way to drive these results is to create or validate SLA’s amongst all of the business units or stake-holders.

Speaking of which, that is the other most common failure amongst many of my customers. Data protection mechanisms are created based on perceived needs rather than any measured or clearly defined business requirements. As an example, it’s very common to encounter sites that use backup technologies to capture nightly incremental backups and once weekly full backups. These are typically implemented across the board without considering that some applications require more frequent, or even less frequent backups. Often, secondary protection mechanism are implemented by application groups, DBA’s, or even non-storage system’s administrators. These secondary schemes are in place because the system wide protection mechanisms are perceived as either in-adequate or not realistic to their needs. These are clear indications that the overall DR strategy is flawed, and needs to be addressed.

UPDATE: I’ve posted an updated list here for those of you referencing this old posting.

There’s a zillion of these lists out there, but this is mine. A list of the essential, cool, and nice-to-have Mac apps. This is all my most important free or shareware products. The list of commercial stuff will be the topics of another day.

HandBrake – The easiest way to rip, transcode, and store DVD’s. Can be used for video iPods as well. (free)

Thoth – The best USENET news reader out there (there’s also Unison – actively being developed). Thoth is not actively being developed, so you have to … ahem…. find it on USENET. (free – kinda)

Vim – The VI clone with a GUI interface. Already comes in a CLI format built in. Vim.org has the GUI version. (free)

VLC – The opensource Video viewer. If this doesn’t play it, you can’t play it on a Mac. (free)

Flip4Mac – Microsoft has stopped supporting their video player and is now giving this as a Quicktime plugin instead. This works better than the media player ever did, but doesn’t work with DRM content. (free)

It just occurred to me that no-one in that slashdot thing actually asked me how much this stuff cost to get!

Now why would that be? Everyone got the on the “bigger penis”, “electrical cost”, and “here’s my geek fest.” Not a single comment was made on how much does that crap cost to acquire. Odd, very odd.

So, I should for the record say this. Of all those things I paid for: PowerBook, backup hard drive, ups, wireless router 1&2. Everything else was donated, gifted, or salvaged. Now that I think about it, that’s pretty amazing. More juice than GaTech on basically a dime budget.

It’s been pretty cool to run a fairly large network at home as described in slashdot, so I thought I’d find out what other geeks like me like doing in their own home.

My expectation was that I would get some derision, some laughter, some “my penis is larger than your penis” comments (both good and bad), and also some insight into some people running larger, more sophisticated networks. Well, if you read through the article, you can see that there was a ton of all of that. However, I was surprise to find that most people, even the ones that I would nominally call my peeps, just don’t understand. There was a few that had comparable or even larger setups, and I think they get it. It’s not about the amount of crap I can put together, or “I can piss farther than you can,” but more about the “I can do this, so I will,” and the “wouldn’t it be cool if…”. It’s more like the car guys that want to figure out, can I make this thing go twice as fast by adding all these whatsits, or by injecting perclorofloroanalmoverzine. It’s not about saving the environment, but about just doing it.

Some of the responders also pointed out that the same kind of stuff could be done with much less. Very true. I just checked across the DCF and saw that I’m on average running a load of .0something. If you think about it, that’s very much to be expected. My home setup has more computing resources, by far, than the combined resources of my first admin job at Georgia Tech. Yes, my little, fun toy setup is much more capable and has more bandwidth, storage, cpu, and memory capacity than all of the systems at Tech did 20 years ago. By a large margin. That alone brings me some pleasure.

But, the biggest part of this all is that it’s also cool for me to stay sharp at doing system stuff, specially since my job’s technical requirements is now essentially limited to email, excel, visio, and powerpoint. Not that I mind that as my job, I just enjoy the sysadmin stuff. My mind thinks that way, so I like to exercise it.

So – this is not an apology for what I have, it’s not even an explanation or justification for the things that I do. Like the thing itself – it just is.

Here’s some pictures of the DCF:

Home built rack system. You can see: main server, ups, netapp, drive shelves, printer, KVM, main switch, wireless switch 1&2, backup server, and print server. You can not see, monitor, laptop, gig-e switch.

You can also see some of the boxes for my Amiga 2000, Amiga 1000, TRS-80 Model 1, and TI-99/4a. That’s part of another story.