because the chances that MS's DC is going to blow up is extremely small

And yet, it is what this thread is about ... exactly that happening.

Except that it's Amazon, not MS.

MS was US Central this year or late last.

MS was the world when their authentication mechanism went down I think it was a year or so ago.

MS was Europe offline with VMs hosed and a recovery needed. Weeks.

MS has had plenty of trials by fire.

Not one of the hyper-scale folks are trouble free.

Most of our clients have had 100% up-time across solution sets for years and in some cases we're coming up on decades. Cloud can't touch that. Period.

And no updates correct right? to have 100 % Up-time you must never do updates.

In a cluster setting, not too difficult. In this case, 100% up-time is defined as nary a user impacted by any service or app being offline when needed.

So, point of clarification conceded.

Yes, I know you could do a cluster and that's how Cloud Providers give you that 99.9% up-time or SLA. Right now it is hard to believe no one has any issues, if cloud providers in a large scale have issues then smaller companies do have them as well. That said, no cloud provider provides any backups for anyone unless you set them up either through their offering or your company.

I'm quite proud of our record. It's a testament to the amount of time and money put in to research, proof, and thrash the solution sets we've sold over the years. We don't sell anything we first don't proof.

So you're using technology that is at least a decade old for every one of your customers, because by your own word you can't possibly have had the time to test anything from this year and sold it to a customer!

Not sure how that conclusion came about but far from it.

We've had plenty of NDAs over the years to proof with upcoming tech so that we're on the right page and current.

I'm quite proud of our record. It's a testament to the amount of time and money put in to research, proof, and thrash the solution sets we've sold over the years. We don't sell anything we first don't proof.

So you're using technology that is at least a decade old for every one of your customers, because by your own word you can't possibly have had the time to test anything from this year and sold it to a customer!

Not sure how that conclusion came about but far from it.

We've had plenty of NDAs over the years to proof with upcoming tech so that we're on the right page and current.

You've said you've tested everything that you sell. How could this possibly be true to make claims of decades worth of up-time. Power supplies fail, switches die, disks die, MB's die, sites lose power (which people still have jobs to do - just because the lights are out. . .)

So you're still full of it. Not to mention performing any update will eventually require a restart. Windows updates, file server migrations etc. All require some downtime.

I'm quite proud of our record. It's a testament to the amount of time and money put in to research, proof, and thrash the solution sets we've sold over the years. We don't sell anything we first don't proof.

So you're using technology that is at least a decade old for every one of your customers, because by your own word you can't possibly have had the time to test anything from this year and sold it to a customer!

Not sure how that conclusion came about but far from it.

We've had plenty of NDAs over the years to proof with upcoming tech so that we're on the right page and current.

You've said you've tested everything that you sell. How could this possibly be true to make claims of decades worth of up-time. Power supplies fail, switches die, disks die, MB's die, sites lose power (which people still have jobs to do - just because the lights are out. . .)

So you're still full of it. Not to mention performing any update will eventually require a restart. Windows updates, file server migrations etc. All require some downtime.

all of those things can fail - as long as they have an HA solution that accounts for those failures.

As he said earlier - the customer has NEVER been impacted - that's the point of measurement.

because the chances that MS's DC is going to blow up is extremely small

And yet, it is what this thread is about ... exactly that happening.

Except that it's Amazon, not MS.

MS was US Central this year or late last.

MS was the world when their authentication mechanism went down I think it was a year or so ago.

MS was Europe offline with VMs hosed and a recovery needed. Weeks.

MS has had plenty of trials by fire.

Not one of the hyper-scale folks are trouble free.

Most of our clients have had 100% up-time across solution sets for years and in some cases we're coming up on decades. Cloud can't touch that. Period.

And no updates correct right? to have 100 % Up-time you must never do updates.

In a cluster setting, not too difficult. In this case, 100% up-time is defined as nary a user impacted by any service or app being offline when needed.

So, point of clarification conceded.

Yes, I know you could do a cluster and that's how Cloud Providers give you that 99.9% up-time or SLA. Right now it is hard to believe no one has any issues, if cloud providers in a large scale have issues then smaller companies do have them as well. That said, no cloud provider provides any backups for anyone unless you set them up either through their offering or your company.

Yeah and you can only fault yourself, if you are one AZ that fails. Most serious deployments are in different regions as well.

because the chances that MS's DC is going to blow up is extremely small

And yet, it is what this thread is about ... exactly that happening.

Except that it's Amazon, not MS.

MS was US Central this year or late last.

MS was the world when their authentication mechanism went down I think it was a year or so ago.

MS was Europe offline with VMs hosed and a recovery needed. Weeks.

MS has had plenty of trials by fire.

Not one of the hyper-scale folks are trouble free.

Most of our clients have had 100% up-time across solution sets for years and in some cases we're coming up on decades. Cloud can't touch that. Period.

And no updates correct right? to have 100 % Up-time you must never do updates.

In a cluster setting, not too difficult. In this case, 100% up-time is defined as nary a user impacted by any service or app being offline when needed.

So, point of clarification conceded.

Yes, I know you could do a cluster and that's how Cloud Providers give you that 99.9% up-time or SLA. Right now it is hard to believe no one has any issues, if cloud providers in a large scale have issues then smaller companies do have them as well. That said, no cloud provider provides any backups for anyone unless you set them up either through their offering or your company.

Yeah and you can only fault yourself, if you are one AZ that fails. Most serious deployments are in different regions as well.

because the chances that MS's DC is going to blow up is extremely small

And yet, it is what this thread is about ... exactly that happening.

Except that it's Amazon, not MS.

MS was US Central this year or late last.

MS was the world when their authentication mechanism went down I think it was a year or so ago.

MS was Europe offline with VMs hosed and a recovery needed. Weeks.

MS has had plenty of trials by fire.

Not one of the hyper-scale folks are trouble free.

Most of our clients have had 100% up-time across solution sets for years and in some cases we're coming up on decades. Cloud can't touch that. Period.

And no updates correct right? to have 100 % Up-time you must never do updates.

In a cluster setting, not too difficult. In this case, 100% up-time is defined as nary a user impacted by any service or app being offline when needed.

So, point of clarification conceded.

Yes, I know you could do a cluster and that's how Cloud Providers give you that 99.9% up-time or SLA. Right now it is hard to believe no one has any issues, if cloud providers in a large scale have issues then smaller companies do have them as well. That said, no cloud provider provides any backups for anyone unless you set them up either through their offering or your company.

Yeah and you can only fault yourself, if you are one AZ that fails. Most serious deployments are in different regions as well.

because the chances that MS's DC is going to blow up is extremely small

And yet, it is what this thread is about ... exactly that happening.

Except that it's Amazon, not MS.

MS was US Central this year or late last.

MS was the world when their authentication mechanism went down I think it was a year or so ago.

MS was Europe offline with VMs hosed and a recovery needed. Weeks.

MS has had plenty of trials by fire.

Not one of the hyper-scale folks are trouble free.

Most of our clients have had 100% up-time across solution sets for years and in some cases we're coming up on decades. Cloud can't touch that. Period.

And no updates correct right? to have 100 % Up-time you must never do updates.

In a cluster setting, not too difficult. In this case, 100% up-time is defined as nary a user impacted by any service or app being offline when needed.

So, point of clarification conceded.

Yes, I know you could do a cluster and that's how Cloud Providers give you that 99.9% up-time or SLA. Right now it is hard to believe no one has any issues, if cloud providers in a large scale have issues then smaller companies do have them as well. That said, no cloud provider provides any backups for anyone unless you set them up either through their offering or your company.

Yeah and you can only fault yourself, if you are one AZ that fails. Most serious deployments are in different regions as well.

because the chances that MS's DC is going to blow up is extremely small

And yet, it is what this thread is about ... exactly that happening.

Except that it's Amazon, not MS.

MS was US Central this year or late last.

MS was the world when their authentication mechanism went down I think it was a year or so ago.

MS was Europe offline with VMs hosed and a recovery needed. Weeks.

MS has had plenty of trials by fire.

Not one of the hyper-scale folks are trouble free.

Most of our clients have had 100% up-time across solution sets for years and in some cases we're coming up on decades. Cloud can't touch that. Period.

And no updates correct right? to have 100 % Up-time you must never do updates.

In a cluster setting, not too difficult. In this case, 100% up-time is defined as nary a user impacted by any service or app being offline when needed.

So, point of clarification conceded.

Yes, I know you could do a cluster and that's how Cloud Providers give you that 99.9% up-time or SLA. Right now it is hard to believe no one has any issues, if cloud providers in a large scale have issues then smaller companies do have them as well. That said, no cloud provider provides any backups for anyone unless you set them up either through their offering or your company.

Yeah and you can only fault yourself, if you are one AZ that fails. Most serious deployments are in different regions as well.

As we have further investigated this event with our customers, we have discovered a few isolated cases where customers' applications running across multiple Availability Zones saw unexpected impact

At some point, you have to be willing to accept some risks by by not using a different region, generally the risk is VERY, VERY low which is why many customers use AZs.

You have to do risk anaylsis, and see how often these events occur and how likely you would be to be one of the "few" that were impacted.

You can dig in the weeds all you want, but across multiple regions this wouldnt have happened. Which is true HA

Well, different regions wouldn't be enough for true HA. You'd need different cloud providers as well.

Otherwise you have something called common mode failure. Which is for instance that they are running on the same architecture, maybe even the same hardware and as such could be susceptible to a single problem that will affect the entire cloud.

I'm quite proud of our record. It's a testament to the amount of time and money put in to research, proof, and thrash the solution sets we've sold over the years. We don't sell anything we first don't proof.

So you're using technology that is at least a decade old for every one of your customers, because by your own word you can't possibly have had the time to test anything from this year and sold it to a customer!

Not sure how that conclusion came about but far from it.

We've had plenty of NDAs over the years to proof with upcoming tech so that we're on the right page and current.

You've said you've tested everything that you sell. How could this possibly be true to make claims of decades worth of up-time. Power supplies fail, switches die, disks die, MB's die, sites lose power (which people still have jobs to do - just because the lights are out. . .)

So you're still full of it. Not to mention performing any update will eventually require a restart. Windows updates, file server migrations etc. All require some downtime.

all of those things can fail - as long as they have an HA solution that accounts for those failures.

As he said earlier - the customer has NEVER been impacted - that's the point of measurement.

I'm quite proud of our record. It's a testament to the amount of time and money put in to research, proof, and thrash the solution sets we've sold over the years. We don't sell anything we first don't proof.

So you're using technology that is at least a decade old for every one of your customers, because by your own word you can't possibly have had the time to test anything from this year and sold it to a customer!

Not sure how that conclusion came about but far from it.

We've had plenty of NDAs over the years to proof with upcoming tech so that we're on the right page and current.

You've said you've tested everything that you sell. How could this possibly be true to make claims of decades worth of up-time. Power supplies fail, switches die, disks die, MB's die, sites lose power (which people still have jobs to do - just because the lights are out. . .)

So you're still full of it. Not to mention performing any update will eventually require a restart. Windows updates, file server migrations etc. All require some downtime.

all of those things can fail - as long as they have an HA solution that accounts for those failures.

As he said earlier - the customer has NEVER been impacted - that's the point of measurement.

Well, different regions wouldn't be enough for true HA. You'd need different cloud providers as well.

HA has to do with the uptime, not the amount of redundancy. Different regions from AWS is definitely way more than enough for HA by any standard. You can do HA with a single datacenter, just not an AWS datacenter. But lots of single datacenters provide HA at a facility level.

But your app has to support the multiple datacenter model. That's what is really hard for most people.

YES YES YES SCREW AWS, they have this big marketing scheme for CEOs that force us to work for those CEOs that believe everything is better in AWS, and the server wont work properly unless its AWS, then when the bill comes we have to explain to them that we can never calculate the cost accurately cause it is Amazon AWS, and they charge for IOPS, and there is no way I can calculate that shit, its meant to be bill sinkhole for to pay bezos divorce settlement .

because the chances that MS's DC is going to blow up is extremely small

And yet, it is what this thread is about ... exactly that happening.

Except that it's Amazon, not MS.

MS was US Central this year or late last.

MS was the world when their authentication mechanism went down I think it was a year or so ago.

MS was Europe offline with VMs hosed and a recovery needed. Weeks.

MS has had plenty of trials by fire.

Not one of the hyper-scale folks are trouble free.

Most of our clients have had 100% up-time across solution sets for years and in some cases we're coming up on decades. Cloud can't touch that. Period.

And no updates correct right? to have 100 % Up-time you must never do updates.

In a cluster setting, not too difficult. In this case, 100% up-time is defined as nary a user impacted by any service or app being offline when needed.

So, point of clarification conceded.

Yes, I know you could do a cluster and that's how Cloud Providers give you that 99.9% up-time or SLA. Right now it is hard to believe no one has any issues, if cloud providers in a large scale have issues then smaller companies do have them as well. That said, no cloud provider provides any backups for anyone unless you set them up either through their offering or your company.

Yeah and you can only fault yourself, if you are one AZ that fails. Most serious deployments are in different regions as well.

As we have further investigated this event with our customers, we have discovered a few isolated cases where customers' applications running across multiple Availability Zones saw unexpected impact

At some point, you have to be willing to accept some risks by by not using a different region, generally the risk is VERY, VERY low which is why many customers use AZs.

You have to do risk anaylsis, and see how often these events occur and how likely you would be to be one of the "few" that were impacted.

You can dig in the weeds all you want, but across multiple regions this wouldnt have happened. Which is true HA

Well, different regions wouldn't be enough for true HA. You'd need different cloud providers as well.

Otherwise you have something called common mode failure. Which is for instance that they are running on the same architecture, maybe even the same hardware and as such could be susceptible to a single problem that will affect the entire cloud.

I toyed with this idea, but it is a bit unlieky, however that said when you are vendor agnostic and you have Centos box in DigitalOcean and another one using Vultr.

YES YES YES SCREW AWS, they have this big marketing scheme for CEOs that force us to work for those CEOs that believe everything is better in AWS, and the server wont work properly unless its AWS, then when the bill comes we have to explain to them that we can never calculate the cost accurately cause it is Amazon AWS, and they charge for IOPS, and there is no way I can calculate that shit, its meant to be bill sinkhole for to pay bezos divorce settlement .

The Great Firewall of Cloud Marketing has done a great job of suppressing the billing shock that cloud brings with it. It's also been great at suppressing the movement back on-premises where costs are fairly well established.

We have a client we work with that has a handsome cloud credit every month well into five figures. They did some testing for their application work in-cloud to see how it would work. They burned through that five figure credit in a matter of a few days much to their surprise. They put their workload into that cloud, get it up and running, and then the following year that credit disappears. So, they get a billing spike on top of the six figure count it would cost them to run entirely all-in. We have a high performance all-flash hyper-converged solution set just for them.