SANs -vs- local Server Storage -vs- environmental conditions

This is a conversation I have frequently with system engineers, and one I increasingly have odds with. After the latest discussion turned fairly heated, I thought it would be interesting to get some of the responses here.

First disclaimer is I am not a mass/shared storage expert. Primarily being a contractor the past 15 years and bouncing from company to company I get exposed to a lot of data centers ranging from entire floors of skyscrapers with state of the art redundant back-up power generators and AC units handling +50,000 users to the basement shelf next to the dog-food at local mom and pop store.

One characteristic anomaly that has stood out the past decade in my experience is the rather high failure rates I encounter with SAN units of all breeds. When I refer to 'failure rate' I mean anything greater '>' than a single drive failure. This includes controller failures, data corruption across multiple drives, power supply issues, etc. I also can't give a list of models or brands...except it seems to be broad and across all types. Given that so many SAN units are re-branded versions of EMC etc it's tough to point fingers.

By far the vast majority of severe SAN problems I've been around have been the same scenario. Essentially the room in which the SAN is housed gets too warm due to AC issues, or the dedicated AC unit isn't on the same circuit as the back-up generator being tested on the weekend (very common), etc. If this happens on a weekend or holiday it can often be a few days before somebody enters the room and verifies the problem. At that point entire volumes start getting trashed and the CIO starts screaming. Rinse, repeat. As to why thermal alarms don't kick in, you tell me. Storage engineers tend to have their fingers in their ears on the issue while other techs who move around like I do all report the same problem. SAN unit failures and mass data corruption often tend to be higher than local storage on bare metal servers in the same environment.

A few days ago I encountered a brand new EMC VNX installed in the basement of small company. No data center, environmental temp is all over the place, and the prior 4U Dell SANs had given up the ghost. Unit basically sitting on a bread rack in the office the janitor shares. The real 'irk' with me was the company in question could have easily functioned with local storage on a small server. This prompted the initial discussion with the tech who set up the SANs and me throwing up my hands in the air when he wouldn't listen. Obviously the VNX with it's fancy 'blinken lights' and high price tag was superior than a lowly 3-4drive SATA RAID config in a vanilla 8core server. I bet the guy the standalone server would take a significantly higher ambient 'T' temp than the SANs before sending terrabytes off to 'bit bucket' land. Meanwhile, half a dozen EMC reps wearing bad suits would be swearing to me it can't happen.

My question is basically if there is a class of SANs that those more experienced with shared storage are confident can handle the real world outside a cozy data center kept at 68F. Or, just a general comment on my experience.

We had a cooling failure about two months ago. Temps spiked to over 120F in the cold aisle. Since then, I've had five disk failures on an HP EVA unit with ~65 disks, and only about two on our XIV with 180 disks. The XIV hosts most of the load in our environment, so is always being stressed much more than the EVA. Compared to the EVA, it's held up like a champ so far

We had a cooling failure about two months ago. Temps spiked to over 120F in the cold aisle. Since then, I've had five disk failures on an HP EVA unit with ~65 disks, and only about two on our XIV with 180 disks. The XIV hosts most of the load in our environment, so is always being stressed much more than the EVA. Compared to the EVA, it's held up like a champ so far

What's the age of the 2 units? Also what XCS code is that EVA running and what drives. For a long while 1TB+ drives for the EVA's had high failure rates.

Not related to heat/environmental issues, but I did have 1 of my EVA's crap out 8 or so drives on a 36 drive system within a 2 week span(I think it was still 36, might have just been upgraded to 48). Thing was I didn't lose data and no data corruption. This is what we ask of the SAN systems. Take that same percentage of drive failures to a non-virtualized array and you have likely lost or corrupted a lot of data. I had enough free space to allow for that high failure rate thankfully, otherwise I would have lost data too. There was not enough time between failures for the array to rebuild.

Drive failures due to temp/environmental are not going to care if they are in an EVA, MSA, or local server. Neither do controllers. Once the parts reach their high tolerance point, all bets are off. The problems spoken about by the OP are not issues with SAN Arrays, they are issues with morons not adequately spec'ing out the rooms the array is going into. Round hole, square peg issues.

It is also unlikely that if a SAN is needed for space or performance, a 3-4 drive RAID array would have enough performance or space to cover the need. Now, many new servers can do 8-16 drives which is effectively equivalent to an old MSA as far as spindles so if a small SAN is spec'd out for some reason and there is no real need to grow the array in the 3-5 year future, it'd be a bit silly not to go internal or DAS; using SSD for performance needs. No one solution makes sense to put everywhere.

We had a cooling failure about two months ago. Temps spiked to over 120F in the cold aisle. Since then, I've had five disk failures on an HP EVA unit with ~65 disks, and only about two on our XIV with 180 disks. The XIV hosts most of the load in our environment, so is always being stressed much more than the EVA. Compared to the EVA, it's held up like a champ so far

Sounds like a known issue with a couple hard drive firmware packages. Basically once the drives exceeded the cooling threshold they would drop out. It's super important to keep drive and controller firmware updated in all storage arrays. Actually across the board with servers etc. I just can't tell you how many times firmware really does fix the issue...or at least getting drivers and firmware to close levels helps *lookingatyouemulex* a situation. The other thing about storage arrays....they don't play with your data. In the past I know HP has gotten black eyes for being to conservative on disk failures and proactive failing drives. Although with 1.2TB Enterprise drives now out I am actually thinking RAID5 is totally dead anyways.

The other thing DAS is always faster than a SAN...unless you need shared storage I find very few workloads that don't perform the same or better on DAS. Until you need to cluster something then that is a totally different ball of wax and I was a storage consultant for many years. Now SSD's make things a bit more interesting but the reality is you almost can't even beat dedicated resources for a lot of workloads. Just availability sorta blows...

My question is basically if there is a class of SANs that those more experienced with shared storage are confident can handle the real world outside a cozy data center kept at 68F. Or, just a general comment on my experience.

I think maybe in general disk systems are a bit more sensitive. So once you put lots of disks into the same basket, you'll see higher absolute number of failures, though possibly the same failure rates as with smaller sets of disks, it's just more noticeable.

Akro and WingMan, you two are spot on. Admittedly, the firmware is a little dated on both the controllers and some of the drives themselves. Unfortunately it's easier pulling teeth than trying to get approval for firmware upgrades from my org. During the cooling failure and subsequent emergency power down, we did lose one LUN to corruption. Other than a few non-critical test systems, most of the data was recovered from backup. All the disk failures since have been staggered enough that we haven't had any other issues. Also to confirm your point, it's not specifically a SAN issue - we've also had a few other drives and various components (power supplies, RAM, etc.) go bad in the last two months.

The problems spoken about by the OP are not issues with SAN Arrays, they are issues with morons not adequately spec'ing out the rooms the array is going into.

I would agree mostly with the jest of that statement. Obviously a SANs, given it's design of housing densely clustered arrays of rapidly spinning magnesium disks all being fed fairly significant amounts of DC current has different thermal management requirements than a medium size, stand alone server that's mostly an empty case with three drives in it. Drive thermal tolerance is the same along with controller. SANs housing these in denser and closer proximity requires additional design and environment conditions. That's....common sense guys.

By this definition, 75% of SAN / storage engineers would qualify as 'Moron', and given my conversations with many of them I would not argue that description SANs do not fail - period. I don't care what you say - they do not fail, and this room will always be 68F, got it?

/ sarcasm off

Also, the most fault prone device I encounter in data centers are AC units, or the power systems running them. Since it's clear we can't design a data center without better thermal management requirements than a 1972 sci-fi movie computer lab, then what gives? At this point I trust a $20 box fan and a door wedge to keep data secure more than redundant AC units. I've never, ever, ever seen a disaster recovery plan submitted by a data center or storage engineer that includes AC failure even though it's a huge cause of catastrophic storage failure.

Last time I went through this was about a year ago with a couple HP D2D units that started flagging bad disks after getting really warm over a weekend. 'Parity rot' as I call it took a few months to eventually resolve even after pulling four drives and replacing them. As to why we're still using RAID 5/6 in 2013 gets me.....mirror only topologies (RAID 10) have been orders of magnitude easier to recover from than RAID 5/6 in my experience. However, data centers will never get above 68F and AC units will never fail.

I can tell you at least from the quickspecs that the EVA is rated for an 95 degree inlet and if I recall all HP gear is designed for a 95 degree inlet, which causes us some issues to officially support some cards in some servers somtimes. I wish I had some more definitive guide other spec sheets but there was big push to get that info out to get customers to save money by reducing the cooling in their data center.

So from that perspective at least with HP there is no difference between server DAS and SAN. Now the trick is with all disk drives going SAS along with the arrays tolerances really have to do with the firmware on the drives and arrays as to when they will spit the drives out. I remember along time ago Hitatchi would be very stringent on drive failure for new disks until they built enough stats about the drive types to get better at predictive failure. Hence why warranty covered drives are typically given back to the vendor and why you need to keep up with firmware.

Yeah, the EVA is fairly aggressive at kicking out drives with issues, but based on the academic papers on related sector failures in failed/failing drives I think that's a prudent approach, why continue to put data on a drive that is known to have problem and which is likely to continue to have problems? I mean I would hope it would never prefail a drive if it would invalidate a RID/vdisk but as long as there are spares available using them to avoid potential problems is a GOOD thing and it something that really only makes sense to do when you have a large pile of disks behind a common controller (ie a SAN/NAS). As far as failures from heat, that's going to be drive dependent not controller dependent. Most of the controllers out there are commodity PC's or at least largely made of commodity PC parts (3Par uses their own ASIC as an example but the rest of the system is industry standard parts) and so they'll have the same heat tolerance as your typical rackmount server. I guess my overall impression based on experience is that you need to fix the root cause of the environmental problems and that if you don't you're not any worse off with a massively redundant design and potentially have some features that will save you.

I would agree mostly with the jest of that statement. Obviously a SANs, given it's design of housing densely clustered arrays of rapidly spinning magnesium disks all being fed fairly significant amounts of DC current has different thermal management requirements than a medium size, stand alone server that's mostly an empty case with three drives in it. Drive thermal tolerance is the same along with controller. SANs housing these in denser and closer proximity requires additional design and environment conditions. That's....common sense guys.

By this definition, 75% of SAN / storage engineers would qualify as 'Moron', and given my conversations with many of them I would not argue that description SANs do not fail - period. I don't care what you say - they do not fail, and this room will always be 68F, got it?

/ sarcasm off

Also, the most fault prone device I encounter in data centers are AC units, or the power systems running them. Since it's clear we can't design a data center without better thermal management requirements than a 1972 sci-fi movie computer lab, then what gives? At this point I trust a $20 box fan and a door wedge to keep data secure more than redundant AC units. I've never, ever, ever seen a disaster recovery plan submitted by a data center or storage engineer that includes AC failure even though it's a huge cause of catastrophic storage failure.

Last time I went through this was about a year ago with a couple HP D2D units that started flagging bad disks after getting really warm over a weekend. 'Parity rot' as I call it took a few months to eventually resolve even after pulling four drives and replacing them. As to why we're still using RAID 5/6 in 2013 gets me.....mirror only topologies (RAID 10) have been orders of magnitude easier to recover from than RAID 5/6 in my experience. However, data centers will never get above 68F and AC units will never fail.

I'm sorry, but what you are saying is more in line with an issue either with the facilities guys or someone in management (or a contractor who thinks they know more than everyone else), not the SAN Admin. There are plenty of guys who are in all lines of support who try to put in wrong solutions, either because the salesman is really good at selling stuff or because the architect and managers are simply ignorant. Your storage engineer, unless he also wears the facilities hat, should never be the one designing the DC cooling. He should have input, but the design should not be in his hands unless he is also the person who supports all the other equipment in that data center. Designing any cooling setup solely on storage array requirements will give you the same problems.

As to why we're still using RAID 5/6 in 2013 gets me.....mirror only topologies (RAID 10) have been orders of magnitude easier to recover from than RAID 5/6 in my experience. However, data centers will never get above 68F and AC units will never fail.

Because parity calculations allow you to resolve many cases of data corruption that mirror based solutions do not. On a mirror pair a background scrub can tell you that the two copies disagree but unless you are storing some kind of parity information you don't know which copy is correct (some systems use 520 byte sectors for this but those can only correct single bit errors). I only use RAID10 for systems that perform parity checks and correction at a higher layer (ie Oracle with block checksums turned on).

As to why we're still using RAID 5/6 in 2013 gets me.....mirror only topologies (RAID 10) have been orders of magnitude easier to recover from than RAID 5/6 in my experience. However, data centers will never get above 68F and AC units will never fail.

Because parity calculations allow you to resolve many cases of data corruption that mirror based solutions do not. On a mirror pair a background scrub can tell you that the two copies disagree but unless you are storing some kind of parity information you don't know which copy is correct (some systems use 520 byte sectors for this but those can only correct single bit errors). I only use RAID10 for systems that perform parity checks and correction at a higher layer (ie Oracle with block checksums turned on).

Yeah, his statement had me going "this person should not be anywhere near critical storage." Take your bit of info along with the fact that in SAN, the big dogs are now virtualizing the storage layer making the size of the drive a lot less of a factor in rebuild times. Somewhere else in this forum I posted the number of micro RAID sets that the 3PAR I have is running. Figuring 6+1 set with 1GB slices of those drives, it doesn't take long for the parity calculations and rebuild to happen even on those 1TB disks. I think it was several thousand for just my 30 odd lun presentations. I saw one guy posted out there that his had 80k micro RAID sets. Every type of storage has its place.

The other thing DAS is always faster than a SAN...unless you need shared storage I find very few workloads that don't perform the same or better on DAS. Until you need to cluster something then that is a totally different ball of wax and I was a storage consultant for many years. Now SSD's make things a bit more interesting but the reality is you almost can't even beat dedicated resources for a lot of workloads. Just availability sorta blows...

If you are referring to latency, maybe. Short of an SSD DAS, you are not going to outstrip a v800 3PAR in overall performance on DAS. You will not get 1920 drives in any sort of manageable DAS to produce several hundred thousand IOPS. And now you can get the SSD on these big arrays and even take away that little advantage for DAS. You will, however, find that a FusionIO Octal card will brow beat the v800. Not until more recently was it fairly easily managed to have a critical infrastructure cluster with a DR setup or active/active data centers without SAN managed replication at the block level. Sure you could do with things like double-take or crr, etc. But in reality those were not tools that would secure your data to the tertiary sites at the block level. Mostly when I hear about how DAS is better than SAN, it's generally because either they don't need shared storage (like you pointed out) or they don't understand how to maximize the shared storage between high needs and low needs hosts. SAN is a far different beast than it was 10 years ago. Gone are the days of it being just a bunch of DAS with a fiber controller; carved up like you would for local RAID arrays.

The other thing DAS is always faster than a SAN...unless you need shared storage I find very few workloads that don't perform the same or better on DAS. Until you need to cluster something then that is a totally different ball of wax and I was a storage consultant for many years. Now SSD's make things a bit more interesting but the reality is you almost can't even beat dedicated resources for a lot of workloads. Just availability sorta blows...

If you are referring to latency, maybe. Short of an SSD DAS, you are not going to outstrip a v800 3PAR in overall performance on DAS. You will not get 1920 drives in any sort of manageable DAS to produce several hundred thousand IOPS. And now you can get the SSD on these big arrays and even take away that little advantage for DAS. You will, however, find that a FusionIO Octal card will brow beat the v800. Not until more recently was it fairly easily managed to have a critical infrastructure cluster with a DR setup or active/active data centers without SAN managed replication at the block level. Sure you could do with things like double-take or crr, etc. But in reality those were not tools that would secure your data to the tertiary sites at the block level. Mostly when I hear about how DAS is better than SAN, it's generally because either they don't need shared storage (like you pointed out) or they don't understand how to maximize the shared storage between high needs and low needs hosts. SAN is a far different beast than it was 10 years ago. Gone are the days of it being just a bunch of DAS with a fiber controller; carved up like you would for local RAID arrays.

In price, no doubt that DAS wins every time.

Yes I am very familiar with Modern arrays (I work for HP and design\build lots of 3PAR solutions), I should have not have said DAS is ALWAYS faster as speaking in absolutes is never a wise move and will normally come back to bite you. The problem is too many people just assume SAN is faster and better even if I load it up with 7.2k 4TB drives. Both have their place but the folks who can afford a fully loaded v800 and can actually make use of it is even smaller, top it with the lack of enough qualified storage people it just makes SAN's a risky solution for shops that don't have qualified personnel. I fight on a daily basis the questions like "Why I can't run 300 VM's on 24x4TB SATA drives it's 96TB!" so maybe I am just a little jaded at this point in my view of the world. Of course CxO's accepted virtualization as a way to save money shrink foot print but didn't really think all the way through that RAM and CPU was the most underutilized components and the data center but would increase enterprise storage requirements dramatically which is arguably the most expensive capital expenditure per floor tile in the modern data center. Makes the EMC purchase of VMWare really come into focus doesn't it.

Yes I am very familiar with Modern arrays (I work for HP and design\build lots of 3PAR solutions), I should have not have said DAS is ALWAYS faster as speaking in absolutes is never a wise move and will normally come back to bite you. The problem is too many people just assume SAN is faster and better even if I load it up with 7.2k 4TB drives. Both have their place but the folks who can afford a fully loaded v800 and can actually make use of it is even smaller, top it with the lack of enough qualified storage people it just makes SAN's a risky solution for shops that don't have qualified personnel. I fight on a daily basis the questions like "Why I can't run 300 VM's on 24x4TB SATA drives it's 96TB!" so maybe I am just a little jaded at this point in my view of the world. Of course CxO's accepted virtualization as a way to save money shrink foot print but didn't really think all the way through that RAM and CPU was the most underutilized components and the data center but would increase enterprise storage requirements dramatically which is arguably the most expensive capital expenditure per floor tile in the modern data center. Makes the EMC purchase of VMWare really come into focus doesn't it.

I've had the same experience with a SAN that I have with VM virtualization, pooling and abstracting resources saves money. Between my two sites I have over 350 VM's, if those were traditional rackmount servers with DAS that would be over 700 spindles minimum (RAID1 per server), not to mention the piles of spindles for each database server, file server, email server, etc but on my arrays I have just over 170 spindles (if you want to count my outgoing EVA then it's still under 400). That's a huge pile of savings no matter how you look at it, add in the operational efficiencies gained through the ability to migrate workloads and recover from faults and a SAN is a nobrainer.

I go through the same fight with my boss every week it seems. He has this ingrained faith that somehow SAN storage (or any big name storage) is inherently 'faster' than having drives in a server. He cannot seem to comprehend that a storage array with 4 1G network interfaces and 24 drives in a RAID5 (yeah) on it can be slower than a server with a 2 drive RAID10 of a couple of cheap SATAIII drives. I have shown him hard numbers and he still is like, 'There must be something wrong there.'

He just cannot seem to comprehend the differences in design, features and capacities that make SAN an important choice in the right situations. He just sees SAN==Faster in his head for some reason.

As for the original topic in this thread....

Yeah, you take a server with 4 drives in it, probably used in a situation where it is not stressed, and expose it to less than ideal conditions, it will run ok for quite a while because that kind of server (the entry to mid level enterprise server) is designed for a range of conditions, especially if you are talking a tower design.

A SAN is designed for a much more narrow range of conditions for installation, and is designed for performance and features and to be run in a situation where you have staff actually caring for it. Replacing a drive or two on a good SAN is no big deal in most environments because you are expected to have a support contract that will mean hot spares or cold spares on hand, support to help with any issues, and likely a good backup strategy in case of real problems. If you can afford an expensive SAN (and justify the expense) it is assumed the rest will be there too.

Additionally, the times you see a SAN in bad conditions, it is likely that it has been mistreated in other ways as well: improper mounting resulting in vibration and shock damage, temperature issues, even ESD or improper power supply. Improper config is also likely, or running it beyond spec and having performance issues. They are often left without support meaning issues go for a long time and become worse when they could have been easily fixed.

So no, there is no inherent difference in how 'reliable' a SAN is compared to DAS in a server, but just about everything else about the environment, design, support, and use of the devices is different. All of that will lead to different failure scenarios and patterns.

We have many DAS and FuisionIO setups (where it made sense and there was low or no growth expected in data size) along with our SAN arrays. What I really love is when application owners come back and ask me "can you expand the drive on XXA server? We're going to be out of space sooner than we predicted."

It's fun to see their look when I have to reply: "We can, but we have to buy another disk shelf and more drives, create a new RAID set and either give you another drive letter (unless the controller allows us to add the disk into the existing RAID set), mount it as a folder under the existing drive, or tack it to the back of the other one. The last 2 will result in uneven performance of what would look to be a single drive. It'll take a few weeks to get it approved and done."

Them: "Why so long? Can't you just give it more space like you did on server AXX? We're adding tables and we really need the space by next week."

Me: "Nope. Remember when you insisted that the SAN array couldn't provide the performance you needed so we had to go 24 disk RAID 10 DAS? Never-mind the SAN array is using only 1/4 of its spec'd performance of 20k IOPS at 5ms. Now we need to get a performance and space requirements doc from you to make sure we get the right equipment, wait for management approval to get the new equipment, see what the ship times are on the product, install, and create the new array or get it added into the existing if the controller allows for that."

Gotta sort out the environment first. Too high or too low and you'll greatly shorten the MTBF of spinning disks, regardless of what they're housed in. Telling someone that their brand-new DC is too cold and causing their drives to fail is always an interesting one.

Straight off I have had no similar experiences, but then again I have always worked for organizations that understood the importance of environmentals, and for bonus, all but a few days a year just pushing a bunch of out-side air through the DC would provide some cooling (Northeast U.S.).

But when I read through your 'problem' what comes to mind is the current "You had one job..." meme.

If you thermally stress a server, it may fail, and when you try to power it back on it may be perfectly Ok, or it may be dead. So it will either go back to running or it wont; but it's job is moving data around and processing.

If you stress out network or storage switches they may stop passing packets. And when they come back up they will either be Ok, or they will fail to pass packets. Again, it is their one job, and so it is all they can really succeed or fail at.

If you stress out a storage unit all of its modes of failure involve your data...that's it's job, so that is the only type of failures you can reasonably expect.

Just like RAID isn't backup, so too, a SAN storage isn't backup, and if your physical storage fails you may need to restore it. The last time I had anything actually do that to me it was non-RAIDed physical disks on a mainframe. I have had a couple of double failures on RAID5 arrays; but only on intermediate backup targets which didn't require recovery, and frankly, only with 1TB SATA disks which were noted as having firmware issues pretty much across the board. The 'fix' for that was RAID6, and I have yet to have a RAID6 array degrade to the point of data loss.

Straight off I have had no similar experiences, but then again I have always worked for organizations that understood the importance of environmentals, and for bonus, all but a few days a year just pushing a bunch of out-side air through the DC would provide some cooling (Northeast U.S.).

But when I read through your 'problem' what comes to mind is the current "You had one job..." meme.

This is dead nuts on.

An organization that isn't willing to provide the room for their million dollar asset, shouldn't have that asset.*

A SAN is designed for a much more narrow range of conditions for installation, and is designed for performance and features and to be run in a situation where you have staff actually caring for it. Replacing a drive or two on a good SAN is no big deal in most environments because you are expected to have a support contract that will mean hot spares or cold spares on hand, support to help with any issues, and likely a good backup strategy in case of real problems.

You would be crazy surprised how many large companies (10,000+ employees) that I deal with that when I ask them about their backups they say they don't have any of some critical VM or they're just like oh yeah, backups started failing a month ago. We haven't gotten around to fixing it.

Coming from being a sys admin for a 200 person company and then an SMB consultant where I took backups super seriously, it just blows my mind how such large companies aren't doing proper and/or any backups.

Backups are hard. All enterprise backup software sucks. Finding one that sucks the least is the key...

We use TSM for all of our data. It has been such a JOY to use vs every thing else I have ever used that I want to just sing about it. We have a few servers which are not owned by our subsidary and are owned by corporate in Japan - they use Commvault and it's the only backup software I've ever seen that actually crashes.

Backups are hard. All enterprise backup software sucks. Finding one that sucks the least is the key...

We use TSM for all of our data. It has been such a JOY to use vs every thing else I have ever used that I want to just sing about it. We have a few servers which are not owned by our subsidary and are owned by corporate in Japan - they use Commvault and it's the only backup software I've ever seen that actually crashes.

Are you high? I've been using TSM for 6 years and I was overjoyed to change to something not TSM. My new job uses Veeam. Though I suppose if you have a dedicated TSM admin it would be ok, I only got to dedicate about 20% of my time to keeping it operational.

Backups are hard. All enterprise backup software sucks. Finding one that sucks the least is the key...

We use TSM for all of our data. It has been such a JOY to use vs every thing else I have ever used that I want to just sing about it. We have a few servers which are not owned by our subsidary and are owned by corporate in Japan - they use Commvault and it's the only backup software I've ever seen that actually crashes.

Are you high? I've been using TSM for 6 years and I was overjoyed to change to something not TSM. My new job uses Veeam. Though I suppose if you have a dedicated TSM admin it would be ok, I only got to dedicate about 20% of my time to keeping it operational.

There's a dedicated TSM Admin. He seems to be of the same opinion though that it's a total godsend. (I'm sure there's some GUI vs CLI aspect for some people of course.)

Backups are hard. All enterprise backup software sucks. Finding one that sucks the least is the key...

We use TSM for all of our data. It has been such a JOY to use vs every thing else I have ever used that I want to just sing about it. We have a few servers which are not owned by our subsidary and are owned by corporate in Japan - they use Commvault and it's the only backup software I've ever seen that actually crashes.

Are you high? I've been using TSM for 6 years and I was overjoyed to change to something not TSM. My new job uses Veeam. Though I suppose if you have a dedicated TSM admin it would be ok, I only got to dedicate about 20% of my time to keeping it operational.

There's a dedicated TSM Admin. He seems to be of the same opinion though that it's a total godsend. (I'm sure there's some GUI vs CLI aspect for some people of course.)

It was a godsend, when I started using it in 2006. But 2010 I was less enamoured with it, having to build an entire extra server environment just to get backup reporting in TSM6 thoroughly pissed me off. TSM 5.5 was actually really nice, and then TSM for VE(Virtual Environments) I never could get to work at all.

Backups are hard. All enterprise backup software sucks. Finding one that sucks the least is the key...

We use TSM for all of our data. It has been such a JOY to use vs every thing else I have ever used that I want to just sing about it. We have a few servers which are not owned by our subsidary and are owned by corporate in Japan - they use Commvault and it's the only backup software I've ever seen that actually crashes.

Are you high? I've been using TSM for 6 years and I was overjoyed to change to something not TSM. My new job uses Veeam. Though I suppose if you have a dedicated TSM admin it would be ok, I only got to dedicate about 20% of my time to keeping it operational.

There's a dedicated TSM Admin. He seems to be of the same opinion though that it's a total godsend. (I'm sure there's some GUI vs CLI aspect for some people of course.)

It was a godsend, when I started using it in 2006. But 2010 I was less enamoured with it, having to build an entire extra server environment just to get backup reporting in TSM6 thoroughly pissed me off. TSM 5.5 was actually really nice, and then TSM for VE(Virtual Environments) I never could get to work at all.

Our backup requirements are more just about getting the PostgreSQL backups and transaction logs off the servers. Getting things like OS backups and normal data are all secondary to getting those backups. Actually getting the SAN backups is also really important, but the reality is that we have >2x redundancy on the SAN, so less of a big deal.

Backups are hard. All enterprise backup software sucks. Finding one that sucks the least is the key...

We use TSM for all of our data. It has been such a JOY to use vs every thing else I have ever used that I want to just sing about it. We have a few servers which are not owned by our subsidary and are owned by corporate in Japan - they use Commvault and it's the only backup software I've ever seen that actually crashes.

Are you high? I've been using TSM for 6 years and I was overjoyed to change to something not TSM. My new job uses Veeam. Though I suppose if you have a dedicated TSM admin it would be ok, I only got to dedicate about 20% of my time to keeping it operational.

There's a dedicated TSM Admin. He seems to be of the same opinion though that it's a total godsend. (I'm sure there's some GUI vs CLI aspect for some people of course.)

It was a godsend, when I started using it in 2006. But 2010 I was less enamoured with it, having to build an entire extra server environment just to get backup reporting in TSM6 thoroughly pissed me off. TSM 5.5 was actually really nice, and then TSM for VE(Virtual Environments) I never could get to work at all.

Our backup requirements are more just about getting the PostgreSQL backups and transaction logs off the servers. Getting things like OS backups and normal data are all secondary to getting those backups. Actually getting the SAN backups is also really important, but the reality is that we have >2x redundancy on the SAN, so less of a big deal.

Oh, if that's all then yeah it will work great. Just keep an eye on the TSM service on the guests. Cause I found they randomly just don't work.