Topics

Featured in Development

As part of our core values of sharing knowledge, the InfoQ editors were keen to capture and share our book and article recommendations for 2018, so that others can benefit from this too. In this second part we are sharing the final batch of recommendations

Featured in Architecture & Design

Tanya Reilly discusses her research into how the fire code evolved in New York and draws on some of the parallels she sees in software. Along the way, she discusses what it means to be an SRE, what effective aspects of the role might look like, and her opinions on what we as an industry should be doing to prevent disasters.

Featured in Culture & Methods

Mik Kersten has published a book, Project to Product, in which he describes a framework for delivering products in the age of software. Drawing on research and experience with many organisations across a wide range of industries, he presents the Flow Framework™ as a way for organisations to adapt their product delivery to the speed of the market.

Featured in DevOps

The fact that machine learning development focuses on hyperparameter tuning and data pipelines does not mean that we need to reinvent the wheel or look for a completely new way. According to Thiago de Faria, DevOps lays a strong foundation: culture change to support experimentation, continuous evaluation, sharing, abstraction layers, observability, and working in products and services.

DevOps at the UK Government

Anna Shipman revealed to the QCon London attendees how DevOps drives UK's Government Digital Service (GDS). GDS aims to lead the digital transformation of UK's government, "mak[ing] digital services and information simpler, clearer and faster". Its most well known site is GOV.UK, which provides government information and services.

At GDS, developers have a lot of autonomy. They are responsible for their application's whole lifecycle. Developers deploy to production from their own laptops and support their own code, including on-call rotations. They also make their own tech choices. Given the government context GDS lives in, all this autonomy prompted a lot of questions from the audience. Shipman explained there was a shared understanding early on that DevOps was the best way to follow GDS' mission. For instance, developers deploy into production from their own laptops for reasons of efficiency and the ability to quickly rollout changes into production. Shipman gave the example of the Heartbleed bug that was solved in a couple of hours after its announcement.

Developers are on-call a weekly rotation basis. GDS has rigorous rules on what events should trigger PagerDuty alerts. Every incident that might trigger PagerDuty must have an entry in GDS' operations manual clearly explaining the mitigating steps to resolve that incident. Only really serious events, such as server crashes, trigger PagerDuty. GDS does not have 24x7 application support requirements so it is possible to switch to static pages when needed. Shipman considers that the operations manual is the most important tool the teams have to support their activities. It is this manual that allows GDS to share the responsibilities through all team members. GDS has a hard rule: whenever a flaw is found in the manual, the person who finds it must first solve the incident at hand and afterwards updates the manual.

Serious incidents trigger blameless post-mortems. The persons involved in the incident first write a report that is widely shared across the organization. Shortly thereafter, all the relevant stakeholders gather for a post-mortem meeting to work out how to prevent or mitigate that incident in the future.

Scheduling deploys to production follows a simple procedure. Teams register their intention in the release plan, which is divided into 30 minutes slots. When it comes the time to deploy, the team must ensure it has the badger toy, which acts as an exclusive lock for deployment.

Each team has the freedom to make their own technological choices. Those choices are discussed with the architects and the infrastructure team. Shipman, an architect at GDS, told the audience that she felt that her role was more about listening to the teams and helping them with their choices than about decreeing governance rules. Shipman cited the case of a team who decided that PostgreSQL was a better fit than MySQL for their scenario. They joined forces with the infrastructure team and performed that migration over several weeks.

GDS uses a lot of open source tools. Among others, they use Jenkins as a CI server, Puppet for IT Automation, syslog and logstash for logging, Cucumber for acceptance testing and Icinga for monitoring. GDS also develops most of their tools and applications in the open. AlphaGov hosts all their tools that are open source, but not supported in any way. GDS Operations hosts tools that have a higher level of commitment, such as vCloud Tools.

How about dragging all government services forward to use open source methodologies?

Your message is awaiting moderation. Thank you for participating in the discussion.

Great to hear that GDS are successfully pioneering progressive and modern methodologies. I would love to see the presentation if you have a video link to it.

However, I must point out the elephant in the room, namely the NHS. The NHS has had a number of high profile expensive IT project failures. Why aren't we hearing any success stories about the NHS IT systems?

Perhaps the GDS staff could get involved with some NHS projects as well.