Gathering thoughts as I organize this topic, and explore it. Suggestions on where to learn more are welcome.

I’ll be simplifying things initially. I plan to start by focusing on the pipeline and tools used in the DevOps culture. I expect the more I dive in this topic, the more precise next posts will be.

For simplicity let’s assume we are dealing with development of a cloud-based web application, having a DevOps toolchain that includes tools like Chef, Puppet, Jenkins, Docker, Packer, AWS, New Relic, Splunk… how do you test a deployment pipeline built on top of these?

I have to start somewhere. I know this: you can approach testing software by dividing the problem into separate areas, researching them, and executing any necessary actions, including finding and resolving issues. The result should hopefully be a high quality, or at least acceptable, product.

Let me try applying these areas to DevOps toolchain, and list the questions/topics that emerge.

Functional

Is it working as expected? What does working as expected mean to you? To your stakeholders?

Do you have unit tests? Integration? End-to-end? How many is enough?

Do you need to do any manual testing after a pipeline step is executed?

Automation / Automatability / Testability

Are you going to automate the testing? Why yes? Why not? How much?

If yes – which tools will you use? Are they free? What alternatives do you have?

Is the toolchain automation-friendly? Was it created with automation in mind?

Is it testing-friendly in general? Do you have hooks / breakpoints to make it easy to test?

UI/UX

Is there a certain User Experience your DevOps tools should deliver?

Is the pipeline error-prone? Can somebody deploy a test build to production by mistake? Can they destroy your current production stack by clicking on a badly described button?

Do you need to support keyboard shortcuts? Arrow keys / tabs to navigate?

Does the UI support long/short inputs for build names, or components? High build numbers?

User Acceptance

Who is your customer? What acceptance do you need from them?

Would you do A/B testing for your pipeline?

Installation / Integration / System

What do you need to integrate with? For example – would you file JIRA tickets automatically if something goes wrong?

Do you need a database? Which version?

What operating system will your toolchain run on? What OS will you support for developing it?

When depending on a 3rd party – do you accept to rely on their uptime? What if critical cloud-based tool goes down when you urgently need to deploy a hotfix?

Will the 3rd party let you know of planned downtime? Is the downtime in a timezone suitable for you?

Do you have backup?

Compatibility

What platforms should you be compatible with? AWS? OpenStack? Azure? Are you going to test all of them?

If your pipeline is web-based – which browsers will you support? Can a bad rendering on Safari cause an error? What about strict Firefox security? What if the users are running Chrome with JS-blocking extension?

Any potential compatibility issues between your tools? Should you test every new version with others?

Globalization

Do you have any dates or numbers showing up in the pipeline? 1.000 and 1,000 are not the same… same goes for 6/12/2016…

Monday is not the first day of the week for everybody. Do you care?

If you have user input – does it support non-ASCII characters? Does it have to?

Any of your users need a localized UI?

If some of your resources are outside your country – would you support them? What if part of the deployment needs a phone number, but it’s in a weird formatting from another country?

Compliance

Are you required to meet certain requirements like SOX or HIPAA? Can your DevOps toolchain and code assure at least part of the compliance?

Any export regulations you might be violating with your DevOps code? What if certain country requires that data is stored locally, but your tools deploy a server on a different continent?

Stress / Load / Performance

Can you deploy 10 servers simultaneously? What about 10000?

How long does it take to deploy the infrastructure? Is 1 hour acceptable? What if 10 minutes is too long?

Did anyone even define these requirements?

Do you track any of the performance metrics?

Security

Do you take any user input? Can a malicious user infect other users? Steal their passwords? Admin password?

Do you store sensitive data in your Jenkins jobs? Where do you store them securely?

How will you prevent users from committing their AWS credentials to public repositories?

Do you remove all access when terminating employees?

Do you use access control? Do you audit user actions? Should you?

Who is really implementing security? Can a single engineer misconfigure firewall on all your production servers?

Supportability

Do you have enough logging to know why something went wrong? Do all 3rd party tools have enough logging?

Where are your logs?

Do you have alerts / notifications in place?

Configuration

What are the configuration options for your jobs?

Documentation

What documentation do you need? Do you have enough if somebody decides to leave abruptly or falls under a bus?

Any public-facing documentation you want to / have to share?

Adoption / Metrics and Instrumentation

Any metrics you want to track?

Do you need to add instrumentation to the jobs to know where the bottlenecks are?

Upgrade / Rollback

How will you test new versions of the tools? Are you ready to roll them back? Will they work after rollback?

Rollout strategy

What is your must-have vs nice-to-have? What tools depend on each other?

Can you define phases of your DevOps toolchain deployment?

Resources

Have you identified all the resources you need for testing?

Environmental resources like hardware, and software that you need?

What about licenses? Any legal review of these needed?

Are you well staffed? Any training your engineers need?

Deliverables

Documentation, artifacts… what else do you need to deliver?

Vendor / 3rd party

When working with a vendor on your DevOps implementation: how much would you want them to test vs you? What is their testing strategy? How much testing overlap should happen? What to they need to deliver?

Definition of done

When can you tell you are happy with the testing of the DevOps toolchain?

Do you need to sign off? Who else signs off?

These are just some initial thoughts. What do you think of these? What’s missing?