A software company delivers software and hosts the solution on their own servers.
Once, a faulty hardware caused delays, random reboots, service downtime and client disappointment. The backup servers were down-sized, so they couldnt handle the load during the fall-back.

The company already had setup different environments for Test, UAT, Preproduction and Production. That works fine with software QA, but was useless as the fault was on Production hardware.

I want to recommend to leverage the Software QA team to start working also on quality control over the IT team and infrastructure.
Where should I Start?
Can you recommend me some practices, guides, or a starting point?

This question exists because it has historical significance, but it is not considered a good, on-topic question for this site, so please do not use it as evidence that you can ask similar questions here. This question and its answers are frozen and cannot be changed. More info: help center.

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
If this question can be reworded to fit the rules in the help center, please edit the question.

1

It sounds like the failure was in the provisioning of (backup) infrastructure. How is QA supposed to help with that? That's not their core competency. Perhaps you should partner with a Managed Services Provider (e.g. IaaS provider).
–
HTTP500Aug 6 '12 at 16:28

2 Answers
2

Unsure why somebody voted down this question. Solid integration between infrastructure, ops, development, and QA teams can be a tough issue, particularly in larger organizations where each of the teams report to different hierarchies.

One place you could start would be with the DevOps movement. A quick search turns up a bunch of good sources to begin researching.

Within your organization, you might start out with some basic internal networking, inviting members of the various teams to get together for brownbag sessions, or (if you can get your manager to spring for it) pizza. A little bit of unofficial team-building can go a long way, as well as show-and-tell sessions to share knowledge.

In my experience, nobody likes it when their stack fails, even when the failure can be blamed on another team. Building some strength and depth among the various teams and their members can help a lot with planning and evolution of the entire stack, and goes a long way to convincing management when changes are needed.

It most likely was downvoted because of the "shopping" or "opinionated" viewpoint of the question and the lack of research into potential concepts. Instead it solicited debate and opinions which goes against the FAQ.
–
Brent PabstAug 6 '12 at 16:42

This sounds like more of a application development and infrastructure team coordination issue than strictly a software QA team issue. Meetings between high-level application development team members and high-level infrastructure team members that have an understanding of hardware requirements, policies, procedures, and timelines can prevent a lot of these issues. Keep your environments exactly the same. Have the infrastructure team work with the development team to understand their applications and set appropriate hardware sizing for expected use and future growth. Get the infrastructure team to start thinking of potential problems with new applications and their existing environment and ways to prevent them. Educate the development team about infrastructure procedures and have them think about their potential impacts to infrastructure and what can be done to prevent such issues. Let the development and QA teams know about the infrastructure processes that occur such as backups, batch jobs, periods of high disk / network utilization, etc. so they can add that to their planning and testing. The QA team will be naturally integrated into the process through thorough quality and load testing of the system. You may already do a lot of this but getting the individual groups to work as a cohesive team is what got our company over such issues. Once we were able to get the teams thinking about how they would impact each other problems started to solve themselves before they occurred and when problems came up it became a group effort to solve them.