Probably FineAlex Wilson's Tech Ramblingshttps://blog.probablyfine.co.uk
Notes from the Week #18<h2 id="monday">Monday</h2>
<p>Monday was pretty meeting-focused.</p>
<p>We had a huddle on <strong>deriving a set of SLOs</strong> from our initial graphite SLIs. The outcome of the session was that our metrics needed further refinement - what we actually want is to have a response time bound for well-formed requests and a different threshold for number of queries that time out or are invalid rather than overall request latency.</p>
<p>Our second session was a <strong>retrospective on how we handle ‘walk-ups’</strong> - Shift is pretty lucky that we surrounded by our customers which keeps feedback loops tight but we become overloaded by questions and distractions. We used index cards to keep track of the number of walkups broken down by subject, and decided to productise it into a Google Form for longer-term storage and analytics.</p>
<h2 id="tuesday">Tuesday</h2>
<p>I paired with Narayan on Tuesday to make some <strong>efficiency improvements to our generated firewall configurations</strong>. We’ve been less than judicious with some of the templated rule-sets and this was an opportunity to smooth out our global Puppet runtimes. We did this by putting feature flags on our security-related puppet classes and started to turn off parts that weren’t being used.</p>
<p>I also went along to my first <strong>weeknotes meetup</strong>! :D</p>
<p>The first venue that <a href="https://twitter.com/stevenjmesser">Steve</a> suggested turned out to be mostly booked by a speed-dating event so we ended up decamping to a cocktail bar nearby. I always like meeting new people (big shout-out to <a href="https://twitter.com/dasbarrett">Dan</a> and <a href="https://twitter.com/puntofisso">Giuseppe</a>) and it’s a weird sensation to hang out for the first time with people who you only know from the Twitter-sphere, but good times were had by all even if I did forget to actually eat and had a delicious two-pint dinner instead.</p>
<h2 id="wednesday">Wednesday</h2>
<p>As tradition dictates, it was 20% time day - I finally got around to releasing my <a href="https://github.com/mrwilson/terraform-aws-protonmail-dns">ProtonMail DNS terraform module</a> and pushing it to the terraform registry.</p>
<p>I set aside a bit of time for <strong>attempting an upgrade</strong> of part of our Puppet systems and it turns out that it’s going to be a fair bit more work than I thought - we’re using the open-source version and we’ve architected it in a way that worked when we first brought it up but makes it harder to incrementally scale.</p>
<h2 id="thursday-and-friday">Thursday and Friday</h2>
<p>I did a fair bit of pairing this week, in total!</p>
<p>I paired with Petrut on AWS optimisations and with <a href="https://medium.com/@sengjea">Seng</a> on improving the state of our SSL certificates, both of which required doing a fair bit of Terraform-ing (I swear this is 90% of my development time now, the rest is Python). I miss TDD’d app development. :(</p>
<p>Speaking of app development, <a href="https://twitter.com/stephen_gb_">Stephen</a> made an initial <strong>release of a small puppet-token Slack app</strong> that builds on the data that we started piping into DynamoDB from our Puppet runs.</p>
<p>We have a monolithic shared puppet codebase and because we practice trunk-based development we also use a physical mutex to make sure only one team is committing/deploying at a time. This token, an android plush, often requires developers to go and “search” for it to acquire the lock (a practice that worked when we were small but Shift is working to make more scale-appropriate).</p>
<p>We can type <code class="highlighter-rouge">/puppet-token</code> into our Slack and it will tell you where it thinks the token is!</p>
<div style="text-align: center;">
<img src="https://blog.probablyfine.co.uk/assets/puppet-token.png" title="Beep boop puppet-token bot" style="object-fit: cover;" />
</div>
<p>An improvement to our process, but the next step will be using these events (<code class="highlighter-rouge">START</code>, <code class="highlighter-rouge">FINISH</code>) to determine whether a run is in progress and use that as a mutex instead of our token. I’m excited for more of these small human-centric improvements.</p>
<hr />
<p>This weekend, I ordered a copy of Shoshana Zuboff’s <a href="https://profilebooks.com/surveillance">The Age of Surveillance Capitalism</a> - probably required reading for working in ad-tech? - and I’m looking forward to eating through it, I’m going to regenerate my GPG keys, and then quite probably bake a Lemon Drizzle cake (which will require me taking at least <em>some</em> of it into work).</p>
Sat, 09 Feb 2019 00:00:00 +0000https://blog.probablyfine.co.uk/2019/02/09/notes-from-the-week-18.html
https://blog.probablyfine.co.uk/2019/02/09/notes-from-the-week-18.htmlNotes from the Week #17<p>Oh, it’s a long one. I’m trying <em>another</em> new format, breaking down by day - I often forget highlights in trying to limit myself to 2 or 3 things.</p>
<h2 id="monday">Monday</h2>
<h3 id="going-faster">Going faster</h3>
<p>I had a two hour session with the rest of the Team Leads about ways to help us go faster, within the constraints of keeping the Unruly culture that makes us unique and not over-egging the process pudding (so to speak).</p>
<p>It feels a lot like a linear/combinatorial optimisation problem that I learned about in school and uni respectively (I’m shuddering at the memory of manually running the <a href="https://en.wikipedia.org/wiki/Simplex_algorithm">Simplex algorithm</a> during exams). There are a bunch of different levers we are able to pull but every action has an effect on everything else.</p>
<p>This <em>probably</em> falls somewhere in what Cynefin calls the <a href="https://en.wikipedia.org/wiki/Cynefin_framework#Complex">complex</a> domain, and we improve by Probing -&gt; Sensing -&gt; Responding.</p>
<h3 id="demo-time">Demo time</h3>
<p>Shift have also started doing huddles to demonstrate our little pet projects since we’ve been trialling a flexible working system. I’m a self-identified morning person and I like to tinker when I’m in the office on my own.</p>
<p>This week I demo’d a custom Terraform module to wrap up different resources between GitHub, AWS, and other sources. Modules support multiple providers being passed in as attributes since <a href="https://www.terraform.io/upgrade-guides/0-11.html#interactions-between-providers-and-modules">0.11</a>, so this is no longer problematic to wire up.</p>
<p>Stephen demo’d a spike of a Slack app to administer resources on AWS, as well as communicate to users when they have stale assets, building on his 20% from <a href="https://blog.probablyfine.co.uk/2019/01/28/notes-from-the-week-16.html#four">last week</a></p>
<h2 id="tuesday">Tuesday</h2>
<h3 id="yesyesno">Yes/Yes/No</h3>
<p>I’m deliberately very free-and-easy with our team’s process - when someone wants to try something new, like a facilitation technique or a way of doing things, I try my best to take a leaf from <a href="https://www.fastcompany.com/3042080/yes-and-5-more-lessons-in-improv-ing-collaboration-and-creativity-from-second-city">performance improv</a> and respond with “Yes, and …” (unless the idea would potentially have drastic negative consequences, of course).</p>
<p>The idea came from Stephen listening to the podcast <a href="https://www.gimletmedia.com/reply-all">Reply All</a> which has a segment called “Yes/Yes/No”. The podcasters look at a particular tweet, and answer “Yes” or “No” to whether they understand what it means or not, before finally explaining it to each-other so that everyone can answer Yes.</p>
<p>We tried using this format to really dig into our network security model, the technologies we use, the way we provision it and team feedback was a unanimous “let’s do this again”.</p>
<p>My gut reaction to why this worked for us was that actively engaging with the content and explaining bits of it to <em>each-other</em> rather than someone doing a demo made our brains work differently and absorb the information better.</p>
<h3 id="1-to-1s">1-to-1s</h3>
<p>My team and I have 1-to-1s every other Tuesday after lunch. I genuinely love these because I’m able to talk with my team members on a different level to when we’re in a group, and I get a lot of pleasure from engaging ith their thoughts and ideas.</p>
<p>It’s a great time to give and receive feedback, outside of our fortnightly “feedback and cake” sessions which focuses more on group feedback, so 1-to-1 feedback tends to be a lot more personal.</p>
<h2 id="wednesday">Wednesday</h2>
<h3 id="coffee-with-steve">Coffee with Steve</h3>
<p>Sometimes I’m a bit rubbish with times, but now that I’m in the office early every morning I no longer have an excuse for being late for my wednesday chat with Steve!</p>
<p>This week we talked about the differences between the Unruly and the GDS models for infrastructure and SREs. Our SREs have a team of their own but also act as enablers and pseudo-coaches to raise ProDev’s skill level across the board.</p>
<p>The world of SRE-ness is one that Shift has been dipping its feet into a lot over the last couple of months, so it’s great to hear how places other than e.g. Google, Facebook do SRE stuff.</p>
<h3 id="20-time">20% Time</h3>
<p>Wednesday is traditionally (but not always) the day I take my 20% time. It’s normally the day with fewest meetings and it breaks the week up quite nicely. I spiked the <a href="https://blog.probablyfine.co.uk/2019/01/28/notes-from-the-week-16.html#two">terraform module</a> for managing ProtonMail DNS records that I spoke about in my last weeknotes, which will be released on GitHub and the Terraform Registry very soon.</p>
<p>(Is it obvious yet that I really like Terraform? I want to build a custom provider next, probably for some obscure web service)</p>
<h2 id="thursday">Thursday</h2>
<h3 id="diy-team-lunch">DIY Team Lunch</h3>
<p>Sarah had the great idea to hold a spontaneous team lunch in our office - she wrote about it in her own <a href="https://sarahseewhy.github.io/2019/02/01/reflection-week-5.html#team-lunches">weekly reflections</a>. She brought a picnic blanket to one of our meeting rooms, we all brought along our own lunches, and we shared a bottle of Appletizer.</p>
<p>I hadn’t tasted Appletizer for years, and I felt undeniably classier drinking it out of champagne glasses.</p>
<p>There was plenty of non-shop talk but we had a brilliant idea over the course of the hour - we provide a bit of tooling to deploy and run our puppet code, but we still use a manual mutex (in the form of a slightly grubby android plush).</p>
<p>Could we steal a leaf from Hashicorp’s book and replace our manual lock with something like <a href="https://www.terraform.io/docs/state/locking.html">Terraform’s state locking</a>?</p>
<p>This discussion escalated into how we could push events to DynamoDB and profile the workflow much like we would any other system, and show our slightly unkempt Python script a bit of love.</p>
<h3 id="tech-talks">Tech talks</h3>
<p>Ina and I presented the findings and progress that Shift had made towards implementing SLx and Error Budgets for the first of our mission critical systems, <a href="https://graphiteapp.org/">graphite</a>.</p>
<p>We got some really great feedback on both the presentation and the content, and there were some great questions about how we might be using our <a href="https://landing.google.com/sre/sre-book/chapters/embracing-risk/#id-na2u1S2SKi1-marker">Error Budget</a> when we’ve finished making the calculations.</p>
<h3 id="farewell-to-jahed">Farewell to Jahed</h3>
<p>We said goodbye to <a href="https://twitter.com/hedjahead">Jahed</a> this week who is, or … was :(, one of our developers - he’s been here long enough to feel part of the furniture and we’ll certainly miss what he brought to Unruly and ProDev as a whole.</p>
<p>He’s a big supporter of open source like myself so I personally will miss his voice in the blogging and open source group.</p>
<h2 id="friday">Friday</h2>
<h3 id="team-lunch-followup">Team lunch followup</h3>
<p>As the last Shift member in the office, I took 15 minutes and spiked a quick-and-dirty attempt at event pushing functionality in our Puppet workflow.</p>
<ul>
<li>Terraform’d a DynamoDB instance on AWS.</li>
<li>Python’d pushing <code class="highlighter-rouge">{ session_uuid, date, commit_hash, event, hostname }</code> at specific points in the workflow.</li>
</ul>
<p>I’m excited for this to start accreting data over the next week - we’ll have better insight onto how many runs start but are abandoned, a better look at how long runs take, and many more.</p>
<hr />
<p>This weekend I baked some easy <a href="https://thehappyfoodie.co.uk/recipes/fork-biscuits">Fork Biscuits</a> as I slowly build up my baking rep, and I’ve not made biscuits for absolutely YONKS.</p>
<p>I’ve also been ill, but this felt more like a <em>physical</em> cold than the last one I had which seriously affected my usual levels of reasoning. Grotty, but mostly still able to operate at normal capacity.</p>
Sun, 03 Feb 2019 00:00:00 +0000https://blog.probablyfine.co.uk/2019/02/03/notes-from-the-week-17.html
https://blog.probablyfine.co.uk/2019/02/03/notes-from-the-week-17.htmlDebugging an outage without an internet connection<p>The Monday of this week, I was drafted in to help resolve a production incident on a system that I had helped build before I moved teams. What makes this unusual is that I had <em>no way to actually debug it</em> at the time. So here’s a small experience post about what I did and what I learned.</p>
<p><em>NB: These are my personal conclusions, YMMV</em></p>
<h3 id="a-small-amount-of-scene-setting">A small amount of scene-setting</h3>
<ol>
<li>Unruly practices <a href="https://twitter.com/mipsytipsy/status/962151928741285888">developers-on-call</a> because we believe it makes us build better services.</li>
<li>We practice primary, secondary, team-lead levels of escalation, but engineers with relevant experience can be drafted in if they are available.</li>
<li>For REASONS, I had my laptop but no internet access, so I couldn’t see anything that was going on.</li>
<li>I was on a phone-call with the lead of the team that owns the service, and <em>he</em> had all the usual tools at his disposal</li>
</ol>
<p>For the next half hour, I was asking him questions and helping to debug remotely given what I knew of the system.</p>
<h2 id="lesson-1-be-absolutely-clear-with-your-questions-requests-and-points">Lesson 1: Be absolutely clear with your questions, requests, and points.</h2>
<p><em>He couldn’t read my mind.</em></p>
<p>If I was debugging alone, I would likely be jumping back and forth between my hunches, trying to cross them off as quickly as possible. The nature of this new dynamic required a much slower and measured approach.</p>
<blockquote>
<p><strong>Don’t say</strong> “Can you read out the logs from 3 o’clock to 5 o’clock?”</p>
</blockquote>
<p>We’re trying to identify possible causes of the issues, and these (caveat: in my opinion) are better phrased as questions rather than requests</p>
<blockquote>
<p><strong>Do say</strong> “There was a deploy at 3 o’ clock today, is there anything unusual in the app log?”</p>
</blockquote>
<p>If you have the source code available to you, referring to files and line numbers is <em>really</em> helpful — this enabled us to identify particular lines of configuration that might be causing the issue.</p>
<h2 id="lesson-2-say-why-youre-asking-the-question">Lesson 2: Say why you’re asking the question</h2>
<p>I’m not going to treat my colleague like he’s just a pair of hands for me, and I felt it was <em>really</em> important to clarify why I was asking the question. It gives an <strong>opportunity to short-circuit</strong> the query if it’s something that’s already been eliminated.</p>
<blockquote>
<p><strong>Don’t say</strong> “Can you search the logs for AWS request errors?”</p>
</blockquote>
<p>Bonus points: ask about the output, not the act. I don’t really mind how we get to the answer, more that we eliminate or prove a potential cause.</p>
<blockquote>
<p><strong>Do say</strong> “I think that the problem might be being caused by a lack of correct permissions for the AWS credentials. Are there AWS permission denied errors in the app log?”</p>
</blockquote>
<h2 id="lesson-3-they-are-the-system-owners-and-the-experts-treat-them-as-such">Lesson 3: They are the system owners, and the experts. Treat them as such.</h2>
<p>I was pulled in because I’d worked on the system before but well over a year ago. The system might have changed in ways that I don’t know about, so it was important for me to recognise that my mental model might be out of date and I needed to tailor my questions as such.</p>
<blockquote>
<p><strong>Do say</strong> “I remember it behaving like X due to Y. Is this still true?”</p>
</blockquote>
<p>In this scenario, there were a number of things I could <em>probably</em> eliminate based on my previous knowledge of what failure causes would look like for network issues, datastore connections, etc, but I didn’t want to assume anything.</p>
<h2 id="lesson-4-on-call-outages-are-inherently-stressful-dont-make-it-worse">Lesson 4: On-call outages are inherently stressful. Don’t make it worse.</h2>
<p>Feedback when debugging yourself is of the order of seconds.</p>
<p>The thought -&gt; question -&gt; action -&gt; reply loop is <em>significantly</em> longer.</p>
<p>Given that we’re trying to solve the issue in the shortest timeframe available, this process can make things even <em>more</em> stressful if things get out of hand.</p>
<p>There are several things you can do but they depend on how the other person works — for example, are they someone who likes to talk a lot, or more measured with the way they speak?</p>
<p>In the former, I would try to keep a conversation going with my thought process to normalise the conversation, but I wouldn’t do this in the latter scenario.</p>
<hr />
<p>I hope will be a useful read for people who do on-call and who might encounter something like this — <strong>tl;dr</strong> try your best to help, be clear and up-front with questions, show empathy and care to not make things more stressful.</p>
Thu, 31 Jan 2019 00:00:00 +0000https://blog.probablyfine.co.uk/2019/01/31/debugging-an-outage-without-an-internet-connection.html
https://blog.probablyfine.co.uk/2019/01/31/debugging-an-outage-without-an-internet-connection.htmlNotes from the Week #16<h1 id="four-things-that-happened">Four Things That Happened</h1>
<h2 id="one">One</h2>
<p>Ina is one of the developers on my team, and a while ago she set up Lunch-and-Learn sessions on Friday afternoons. This is a session where we watch a recorded conference talk and have a discussion afterwards. The programme has a good balance of technological talks (on topics like infrastructure) and talks about the equally important people-oriented side of our industry.</p>
<p>Last Friday’s talk was <a href="https://www.youtube.com/watch?v=5cr2Yn_MrKg">Tanya Reilly talking about being glue</a> - it’s a pretty heavy and hard-hitting talk about the dichotomy of what’s seen as “promotable” work and what work is actually necessary for a team to perform well. The latter is “glue work”, such as making sure the right people talk to eachother, making sure documentation is done properly, frequently talking to customers.</p>
<p>While absolutely essential glue work is rarely seen as promotable, and this has implications for the diversity balance within software development.</p>
<p>I won’t give more of a summary than that, and I highly recommend you watch the talk.</p>
<h2 id="two">Two</h2>
<p>I’ve finally taken the jump to move away from Google mail for personal stuff and signed up for <a href="https://protonmail.com/">ProtonMail</a>. The lowest non-free tier is 4€ / month which is well within a sensible budget, and I can finally get my email behind <code class="highlighter-rouge">&lt;user&gt; at probablyfine dot co dot uk</code></p>
<p>The transition was close to painless, the longest part of the process was me failing to set up some TXT records in AWS Route 53 properly.</p>
<p>I’ve set aside a bit of time for myself to turn these <a href="https://protonmail.com/support/knowledge-base/anti-spoofing/">DNS records</a> into an open-source Terraform module in the near future.</p>
<h2 id="three">Three</h2>
<p>I attempted to extract some interesting dimensions from some of work’s production git repositories during my 20% time, one of our shared code bases and a handful of larger, team-specific code bases.</p>
<ol>
<li>Log <code class="highlighter-rouge">commit_hash</code>,<code class="highlighter-rouge">commit_date</code>,<code class="highlighter-rouge">commit_message</code> and additions/deletions for each commit.</li>
<li>Process <code class="highlighter-rouge">commit_message</code> using <a href="https://www.conventionalcommits.org/en/v1.0.0-beta.2/">conventional commit</a> guidelines into three new columns (type, scope, raw)</li>
<li>Join (1) and (2) into a nice big CSV</li>
</ol>
<p>I wrote a small Java app for (1) and (2) which munges the output of <code class="highlighter-rouge">git log</code>, parses out the commit message using my <a href="https://github.com/mrwilson/conventional-commit">conventional-commit library</a>, and dumps it all out as a CSV.</p>
<p>Using a publicly-visible <a href="https://github.com/unruly/unruly-puppet/commit/6afa2c43451224e276a2d88415ee883ee75d9ea9">commit</a> as an example, it turns</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>6afa2c4,2019-01-18,chore(base-nrpe): Improve lint-and-test.sh output.
6 3 lint-and-test.sh
</code></pre></div></div>
<p>into a CSV containing:</p>
<table>
<thead>
<tr>
<th>Header</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="highlighter-rouge">commit_date</code></td>
<td>2019-01-18</td>
</tr>
<tr>
<td><code class="highlighter-rouge">commit_hash</code></td>
<td>6afa2c4</td>
</tr>
<tr>
<td><code class="highlighter-rouge">project_name</code></td>
<td>unruly-puppet</td>
</tr>
<tr>
<td><code class="highlighter-rouge">commit_type</code></td>
<td>chore</td>
</tr>
<tr>
<td><code class="highlighter-rouge">commit_scope</code></td>
<td>base-nrpe</td>
</tr>
<tr>
<td><code class="highlighter-rouge">commit_description</code></td>
<td>Improve lint-and-test.sh output.</td>
</tr>
<tr>
<td><code class="highlighter-rouge">commit_raw</code></td>
<td>chore(base-nrpe): Improve lint-and-test.sh output</td>
</tr>
<tr>
<td><code class="highlighter-rouge">filename</code></td>
<td>lint-and-test.sh</td>
</tr>
<tr>
<td><code class="highlighter-rouge">additions</code></td>
<td>6</td>
</tr>
<tr>
<td><code class="highlighter-rouge">deletions</code></td>
<td>3</td>
</tr>
</tbody>
</table>
<p>Running this app over 6 code bases, on commits from 2017-01-01 until today gives me a dataset with around 55k entries. The dataset is limited to 2017 onwards because that was when we first started using conventional commits at Unruly and that gives us two more dimensions to break down by.</p>
<p>I’ve already done a few quick calculations which I’ll be writing about in the near future.</p>
<h2 id="four">Four</h2>
<p>Stephen, another developer on my team, has been working on building Slack bots in his 20% time. He’s spiked and demoed a bot that bridges our Slack instance and AWS users, using a DynamoDB backend for lookups.</p>
<p>This is particularly interesting because there’s not a 1:1 mapping between username in AWS (enforced by Shift) and username in Slack (chosen by the user), so he’s built a layer of indirection to translate between the two when posting to the Slack API.</p>
<p>This is quite exciting because we can dispatch messages to both teams and individual users about the status of systems running in AWS.</p>
<h1 id="other-stuff-that-happened">Other stuff that happened</h1>
<p>I finally finished reading Stephen King’s The Shining - I’ve not seen the film version but I know enough of the plot points that I didn’t find the book <em>too</em> scary. In the future I’ll be looking for books to give me a good spook, so if you have any recommendations, tweet them at me!</p>
<div style="text-align: center;">
<iframe src="https://giphy.com/embed/l3vRhl6k5tb3oPGLK" width="480" height="270" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe>
</div>
<p>Baking is also back on the cards, as a way of relaxing at the weekends. I made a good old Victoria Sponge and scones in the last week. This was the first time in several years that I used an electric hand whisk rather than beating the cake batter by hand - and I’m never going back.</p>
Mon, 28 Jan 2019 00:00:00 +0000https://blog.probablyfine.co.uk/2019/01/28/notes-from-the-week-16.html
https://blog.probablyfine.co.uk/2019/01/28/notes-from-the-week-16.htmlNotes from the Week #15<p>I’ve spent more time pairing this week than I have all year - granted that’s just over 3 weeks, but still, I often miss the tight collaboration of pairing when I have a fragmented week due to discussions and huddles across the business.</p>
<h1 id="four-things-that-happened">Four Things That Happened</h1>
<h2 id="wheres-o11y">Where’s O11y?</h2>
<p>This week’s main event was a kick-off session to make sure the Product Development is aligned around one of our explicit tech goals for the next quarter: <strong>Observability</strong>. As Shift team-lead, and the owner of the observability strand of work within Shift, I’ve been quite knee-deep and hands-on with this saga.</p>
<p>I ran a quick non-scientific poll in our Slack channel to get a rough idea of how people currently felt about it - my suspicion was that people had <em>ideas</em> about what observability meant but those ideas are probably at best inconsistent or at worst contradictory.</p>
<div style="text-align: center;">
<img src="https://blog.probablyfine.co.uk/assets/poll.png" title="Polling ProDev about observability" style="object-fit: cover;" />
</div>
<p>A lot of the work to improve our observability foundations will be driven through the Shift team, so members of each ProDev team got together and worked in groups to come up with user stories (as developers) for features that would improve their ability to observe their systems in production.</p>
<p>I’m (internally) working from the definition of Observability below:</p>
<blockquote>
<p>… the ability to infer the internal state and behaviour of a system from its outputs</p>
</blockquote>
<p>In the context of software development, outputs could be metrics, logs, or actual system outputs - Shift will be sharing their recent experiences with <a href="https://blog.probablyfine.co.uk/2019/01/11/notes-from-the-week-14.html#new-year-new-me">SLx</a> and error budgets with the wider team,</p>
<h2 id="load-balancing-between-teams">Load balancing between teams</h2>
<p>We have two teams in ProDev that are ostensibly have infrastructure as their primary concern:</p>
<ol>
<li>Shift, the Shared Infrastructure Team, lead by myself, accountable to the core development teams and the CTO.</li>
<li>SREs, the Site Reliability Engineers, lead by the VP of Architecture, accountable to the team leads and the CTO.</li>
</ol>
<p>The work that we might be doing tends to overlap a lot - how do we define the boundaries between the two such that neither team is stepping on the other’s toes but we don’t let things “fall between the cracks”?</p>
<p>SREs help the core development teams achieve excellence in their team-specific infrastructure concerns - Shift doesn’t have the size or the bandwidth to be across <em>all</em> of these concerns. Our SREs are also <em>not</em> part of the on-call system, and are sources of expertise for ProDev.</p>
<p>Shift on the other hand is responsible for operational stability of shared systems and their explicit mission is to build a solid foundation for systems consumed by all teams such as metrics, alerting, and configuration management.</p>
<div style="text-align: center;">
<b><i>But we work together!</i></b>
</div>
<p>SREs are often prototyping ideas and technologies that Shift doesn’t have capacity to do, and we collaborate on bringing them into the production environment in a state that Shift is happy to be on call for. On the flipside, Shift benefits from their expertise much in the same way as the core development teams.</p>
<p>The result is a quite harmonious relationship, and I’m eagerly awaiting upcoming advances that we’ll be collaborating on.</p>
<h2 id="shared-s3--too-much-responsibility">Shared S3 == Too Much Responsibility</h2>
<p>I paired a lot more than usual this week, with Petrut who is one of our SREs. We’ve been fine-tuning lifecycle policies on our S3 buckets and encouraging each team to create new buckets rather than use shared buckets.</p>
<p>When we were much smaller, having a single shared bucket for e.g. backups was fine, but as we’ve grown we’ve had to create numerous policies on the same bucket to cope with each separate prefix.</p>
<p>It’s hard to grok even for someone who’s been at Unruly for a while (like me) so we’re in the process of decommissioning our shared S3 buckets in favour of team-owned single-responsibility buckets.</p>
<div style="text-align: center;">
<iframe src="https://giphy.com/embed/MkmD2CQ02Rs0o" width="480" height="240" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe>
<p><small>Moving data into <a href="https://aws.amazon.com/glacier/">AWS Glacier</a> like ...</small></p>
</div>
<p>It feels a <em>lot</em> like refactoring a piece of code with too many responsibilities, or breaking apart a monolithic application into smaller services - doing a bulk data copy is neither fast nor cheap, and we want to avoid breaking changes during switch-overs too.</p>
<h2 id="counting-s3-vastness">Counting S3 Vastness</h2>
<p>We’ve been using it at Unruly pretty much since it was released, for reporting aggregation and building our data lake, but I thoroughly enjoy the experience of using <a href="https://aws.amazon.com/athena/">AWS Athena</a> - I get into a workflow of:</p>
<ol>
<li>Discover process that takes a long time or outputs a lot of data</li>
<li>Write a Python script to generate smaller CSV files from (1)</li>
<li>Upload the CSVs to S3</li>
<li>Query with Athena</li>
</ol>
<p>As part of the work I was doing with Petrut, we were investigating digital assets from ad campaigns that have long since concluded - videos, images, etc. There were quite of these so we uploaded almost 10,000 CSV files to S3 containing metadata about these assets and could then run sub-5s queries over it all.</p>
<p>The more I do this kind of work, the more I realise the sheer utility of a simple standard format like CSV because it can go pretty much anywhere. If we wanted to do some more <em>fun</em> stuff with the data, we could write some ETL jobs in e.g. Python because it has CSV/JSON support out of the box.</p>
<hr />
<p>I’m going to be leveraging the Athena learnings with my investigations into commit data using <a href="https://blog.probablyfine.co.uk/2018/10/17/ci-and-structured-changelogs.html">conventional commits</a> in the near future, so watch this space for a blogpost full of graphs and fun facts!</p>
Tue, 22 Jan 2019 00:00:00 +0000https://blog.probablyfine.co.uk/2019/01/22/notes-from-the-week-15.html
https://blog.probablyfine.co.uk/2019/01/22/notes-from-the-week-15.htmlNotes from the Week #14<p>Getting back into the habit of blogging after stopping for almost a month is a bit of a wrench, but it means there’s more juicy stuff for me to talk about!</p>
<h2 id="new-year-new-me">new year, new me</h2>
<p>I’ve been digging really hard into reliability and SLXs this year - we’re already well down the road to better understanding the existing reliability for our Graphite metrics-collection system, and are hoping to take these learnings into building SLXs and Error Budgets for our other core systems.</p>
<div style="text-align: center;">
<img src="https://blog.probablyfine.co.uk/assets/graphite.png" title="Wtf happened in the middle..." style="object-fit: cover;" />
</div>
<p>We’ve already gained a lot of insight into how our customers use the metrics collection API to build dashboards:</p>
<ul>
<li>Setting harder timeouts at the load balancer level has exposed just how many dashboards have panels with <em>too many</em> metrics (as Grafana bunches them into a single request)</li>
<li>Measuring our error rates highlighted the number of dashboards relying on metrics that are no longer being collected and cause <strong>5xx</strong> responses at the backend.</li>
</ul>
<p>I want Shift to set a good example for ProDev as a whole - as we scale out a service-oriented approach to architecture being able to have concrete discussions for reliability is going to become paramount.</p>
<h2 id="keeping-records-of-our-decisions">Keeping records of our decisions</h2>
<p>Shift built out a new Puppet module under our <a href="https://github.com/unruly/unruly-puppet">open-source</a> repository. The <code class="highlighter-rouge">nrpe_custom_check</code> module wraps several different configuration files to provide a clean interface to build NRPE checks for production machines.</p>
<p>We designed it to have as few configurable parts as possible (and indeed only has 3 inputs - name, content, and whether the script needs <code class="highlighter-rouge">sudo</code> privileges) but hit a major snag on an architectural point.</p>
<blockquote>
<p>There’s an existing module <code class="highlighter-rouge">base</code> which has some default NRPE plugins. Given <code class="highlighter-rouge">base</code> is designed to be completely standalone and “batteries included”, should it depend on and use the <code class="highlighter-rouge">nrpe_custom_check</code> module rather than using static files?</p>
</blockquote>
<p>There was a lot of back-and-forth debate about the merits composing small well-defined modules together versus jumping into an abstraction too quickly rather than letting the design evolve “naturally”.</p>
<p>In the end, we decided that the points made were too valuable to lose in the mists of time and resolved to adopt <a href="http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions">Architecture Decision Records</a>. The implementation we chose was the Nygard format supported by Nat Pryce’s excellent <a href="https://github.com/npryce/adr-tools">adr-tools</a> toolchain.</p>
<p>You can check out our record here: <a href="https://github.com/unruly/unruly-puppet/blob/master/architecture_decisions/0002-standalone-nrpe-custom-check-module.md">2. Standalone NRPE Custom Check module</a></p>
<p>Historically, we’ve been not as good as we could be at recording not just the decision but the <em>context in which it was made</em>. A key part of Norm Kerth’s Prime Directive is</p>
<blockquote>
<p>Regardless of what we discover, we understand and truly believe that everyone did the best job they could, <strong>given what they knew at the time, their skills and abilities, the resources available, and the situation at hand</strong>.</p>
</blockquote>
<p>Code, documentation, and domain models all succumb to rot over time and it’s important to remember that choices made are probably correct given the information available at the time.</p>
<p>I’m hopeful that we continue to adopt this, especially for shared projects like our monolithic codebases, so that knowledge about why we do certain things is not purely contained in the anecdotes of people who were there at the time.</p>
<h2 id="what-does-value-actually-mean">What does value actually mean?</h2>
<p>New year, and I’m having a bit of a new think on this topic. For a team like Shift whose role is primarily that of support:</p>
<ol>
<li>What does ‘value’ mean?</li>
<li>How do we measure it effectively?</li>
</ol>
<p>A (paraphrased) aphorism from our CTO has been rattling around in my head for a while:</p>
<blockquote>
<p>At any given point in time, there is a good argument to <em>not</em> do X (usually in favour of feature development). Not doing <em>X</em> causes more problems, and will take longer to heal, the longer we leave it.</p>
</blockquote>
<p>Where <em>X</em> can be anything from dependency upgrades, platform investments, even refactoring.</p>
<p>Shift are fundamentally an <em>enabling</em> team - the systems we build support the other teams in ProDev and we are also a source of extra hands for doing work related to our area of expertise.</p>
<p>We frequently pair with other teams to share the knowledge and help guide them around pitfalls that we have experienced, all while improving the state of our own infrastructure by learning about the needs of our teammates.</p>
<p>We don’t often get the quick highs of rapid feature delivery like a focused product development team but we are steadily building out our own tooling and systems to aid ProDev.</p>
<p>2019 is a year full of potential and Shift are going to seize as much of it as possible.</p>
<div style="text-align: center;">
<iframe src="https://giphy.com/embed/3o6MbkKsfsBUHNyBhK" width="480" height="362" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe>
</div>
Fri, 11 Jan 2019 00:00:00 +0000https://blog.probablyfine.co.uk/2019/01/11/notes-from-the-week-14.html
https://blog.probablyfine.co.uk/2019/01/11/notes-from-the-week-14.htmlNotes from the Week #13 - Season Finale<p>Well, this has been a fun trial! I initially had a crack at weeknotes as a way of getting my thoughts out of my head before the weekend, but it’s become much more than that - I finally have a way of sharing with the public, my team, and my future self, things that are Happening.</p>
<p>I am definitely going to carry on with this next year, so look out for Season 2 starting from the first week of January!</p>
<h2 id="some-things-that-happened-in-2018">Some Things That Happened in 2018</h2>
<h3 id="a-year-in-the-life-of-a-team-lead">A year in the life of a team lead</h3>
<p>We’re coming up on a year of my being at the helm of the Shared Infrastructure Team. It’s difficult to overstate how much of a changing experience it has been.</p>
<p>There have been plenty of problems that I wasn’t equipped to handle and have required me to expand my toolset. Things like conflict resolution, care of the professional and pastoral varieties, and getting knee deep in legal and procurement discussions.</p>
<p>If I could say one thing to myself this time last year, it would probably be “<strong>You are absolutely capable of handling everything this throws at you, but you don’t know it yet</strong>”.</p>
<h3 id="getting-better-at-conflict">Getting better at conflict</h3>
<p>I’ve had to occasionally resolve conflicts, but I’ve also had to learn how to handle it myself. Shift has a lot of ambitions and strong opinions, and sometimes those bring us into conflict with other teams and their own goals. I’d like to think I’m much more able to handle conflict in an impersonal way than I was last year.</p>
<p>I certainly have much better tools at my disposal like Reflective Listening.</p>
<h3 id="making-friends-in-other-places">Making friends in other places</h3>
<p>Even though I started on it last year, Democracy Club has again been a source of joy in my year - I feel a deep sense of satisfaction working on things that I know are being used, and are providing a public good (from my point of view, at least).</p>
<p>I made friends with Steve Messer from GDS and have learned a butt-load from our wednesday morning coffee and chats - it sounds like GDS are up to exciting stuff so best of luck to them in 2019 too!</p>
<h3 id="learning-new-stuff">Learning new stuff</h3>
<p>Shift’s domain, being quite infrastructure heavy, tends to employ languages like Shell, Ruby, Python. Holding onto our XP values and practices, we’ve been trying heavily to bring our TDD attitude on board using tools like <a href="https://github.com/bats-core/bats-core">BATS</a>, RSpec, and pytest respectively.</p>
<p>We’ll be adopting Go in the new year, and we’re excited to adopt a new language and expand our skill sets!</p>
<p>I also got much better at critically evaluating software choices - Shift spent a lot of time this year establishing trials to run to decide on best technology or platform to adopt.</p>
<h3 id="embrace-the-fun">Embrace the fun</h3>
<p>I’m lucky enough to lead a team of bright and good-humoured developers, and embracing the fun side of working together is a huge part of what I think makes our team what it is:</p>
<ul>
<li>We do crosswords together at lunch - this wasn’t set up by anyone, but something that just kind of happened.</li>
<li>Some of us have started playing a game called <a href="https://bigpotato.com/gb/games/dont-get-got">Don’t Get Got</a> which has been a source of much delight already.</li>
<li>We frequently joke, make awful puns, and are generally express our happiness outwardly.</li>
<li>One of the team was off ill this week and decided to hold her own mini stand-up.</li>
</ul>
<div style="text-align: center;">
<img src="https://blog.probablyfine.co.uk/assets/standup.jpg" title="Waddling the wall with a remote standup" style="object-fit: cover; width: 40%; max-height: 40%;" />
</div>
<p>Portia Tung has a great talk called <a href="https://www.infoq.com/presentations/play-team-relations-creativity">The Power of Play - Making Good Teams Great</a> about how simple things like playing games is great for social bonding and make us a more cohesive team.</p>
<h2 id="what-id-like-to-do-in-2019">What I’d Like To Do In 2019</h2>
<p>Here’s some stuff I’d really like to do in my career next year - not resolutions, more <em>aspirations</em>, and I’m definitely not going to finish all of them!</p>
<ul>
<li>
<p><strong>More open-source stuff</strong>: I’ve been trying to drive development in the open and general open-source activities at Unruly, and I’d like to push this even further next year. It would be amazing if non-proprietary systems were all developed in the open so we can really show off the quality of our codebases.</p>
</li>
<li>
<p><strong>More open standards/data stuff</strong>: This is something I want to <em>start</em> doing. I have a lot of admiration for the work that people like <a href="https://twitter.com/edent">Terence Eden</a> are doing in the world of standards and open data. Definitely going to try and get to an <a href="https://www.odcamp.uk/">OpenData Camp</a> too.</p>
</li>
<li>
<p><strong>Get better at user-centric design</strong>: I have a couple of personal projects in progress that require me to better understand graphic design and building user-facing solutions. This is exciting but something out of my comfort zone so far.</p>
</li>
<li>
<p><strong>Get back into reading</strong>: I’ve slowly been slipping out of reading, so I’ll be trying to set myself a goal of reading a set number of both fiction and non-fiction books by the end of 2019 - it’s good to read things that aren’t related to work stuff and freshen up my head.</p>
</li>
</ul>
<hr />
<p>And that’s a wrap on 2018 - see you all in the New Year!</p>
Mon, 17 Dec 2018 00:00:00 +0000https://blog.probablyfine.co.uk/2018/12/17/notes-from-the-week-13.html
https://blog.probablyfine.co.uk/2018/12/17/notes-from-the-week-13.htmlNotes from the Week #12<p>I was expecting a big parliamentary slap-fight today, but the Prime Minister had <a href="https://www.bbc.co.uk/news/uk-politics-46515743">other ideas</a>, so I’m going to write my week-notes instead - it’s a very short one because we’re about to start the new quarter, so most of my time is taken up with discussions that I can’t write about yet!</p>
<h2 id="two-things-that-happened">Two Things That Happened</h2>
<h3 id="one">One</h3>
<p>There’s an art to enabling yourself to do things, at work and at home - and as with anything, there is a balance.</p>
<p>On one hand if you only ever do what <em>you</em> want then there’s going to likely be conflict with those around you. On the other hand if you <em>never</em> do what you want and put everyone else’s needs above your own then this is equally destructive but far more insidious.</p>
<p>Eventually your self-censuring will bottle up your own needs inside you and possibly explode in a very unhelpful way, depending on how you handle stress.</p>
<p>It’s been pointed out to me this week that I have a habit of doing the latter, and getting inevitably frustrated when I don’t get to attend to <em>my</em> needs for an entirely arbitrary and self-imposed reason.</p>
<p>I’ll be trying to keep an eye out for this in future and making an effort to be kinder to myself.</p>
<h3 id="two">Two</h3>
<p>This week Shift have been working with other teams to improve our department-wide <a href="https://en.wikipedia.org/wiki/Business_continuity_planning">business continuity plans</a> - making our answers more efficient/better to questions like</p>
<blockquote>
<p>“What do we do if the office is consumed in a tragic X-based accident?”</p>
</blockquote>
<p>Where X can be:</p>
<ul>
<li>Fire</li>
<li>Politics</li>
<li>Blancmange</li>
<li>Irony</li>
</ul>
<div style="text-align: center;">
<img src="https://blog.probablyfine.co.uk/assets/this_is_fine.jpg" title="Lean BCP in action" style="object-fit: cover;" />
</div>
<p>It’s really felt like having multiple pots boiling on the hob which is great for the part of me that loves context switching and dashing from problem to problem but <em>not</em> great for my ability to generally keep calm and collected.</p>
<p>On the flipside, it <em>was</em> a great opportunity for us (Shift) to dive into the nitty-gritty with other teams and show them what we’re able to do - I’ve written a bit about building social capital in previous week notes and this is exactly what we’ve been doing this week.</p>
Tue, 11 Dec 2018 00:00:00 +0000https://blog.probablyfine.co.uk/2018/12/11/notes-from-the-week-12.html
https://blog.probablyfine.co.uk/2018/12/11/notes-from-the-week-12.htmlNotes from the Week #11<p>I’m trying a slightly different style this week, let’s see how it feels!</p>
<h2 id="four-things-that-happened">Four Things That Happened</h2>
<h3 id="one">One</h3>
<p>It’s the end of ProDev’s quarter, and we celebrated with a … science fair. We’ve done one of these before and I absolutely love the creativity that comes out of a such a simple proposition. Each team prepares a “stall” that we take into our clubhouse meeting room and we can go around and see what each team has done during the last quarter.</p>
<p>We had:</p>
<ul>
<li>Super-detailed artwork on movable wipe-boards</li>
<li>Large monitors showing off new reporting capabilities</li>
<li>Kahoot quizzes about our data platform’s learnings</li>
<li>And more …</li>
</ul>
<div style="text-align: center;">
<img src="https://blog.probablyfine.co.uk/assets/science_fair.jpg" title="Shift's 3 Wishes" style="object-fit: cover; width: 40%; max-height: 40%;" />
</div>
<p>We themed ours on our Opsgenie integration and had “3 Wishes” that we’ve fulfilled during the last quarter.</p>
<p>These events are a great way to down tools (kind of) and create something to show off what we’ve been working on - a side-effect of Agile/XP that I’ve observed is that working in vanishingly thin slices and focusing on incremental delivery removes a lot of the sense of progress as we’re only taking small steps.</p>
<p>Events like these science fairs give us a way to take a step back and recognise an entire quarter’s worth of work.</p>
<h3 id="two">Two</h3>
<p>Shift have been informal adopters of the Occupy Hand Signals as a way of self-moderating group conversations and making sure everyone has opportunities to speak whilst avoiding the “loudest person wins” degenerate case.</p>
<p>We learned this technique from observing another larger team (their team lead wrote about it <a href="https://gidi.io/2018/09/29/occupy-hand-signals.html">here</a>) and found ourselves picking up the very basic signals like “I would like to speak” and “I have a direct response”. Over time we’ve gotten pretty good at the self-enforcement part, such as calling out “X first, then Y” when two people put their hands up to speak.</p>
<p>One of our working agreements in our last retrospective was that we would formally adopt this for meetings of more than two people - a largely ceremonial action but it’s now encoded within our working practices.</p>
<p>There’s also a great <a href="https://gds.blog.gov.uk/2016/10/07/platform-as-a-service-team-takes-even-handed-approach-to-meetings/">GDS</a> blog-post about using these signals.</p>
<p>Finally, I read a brief Twitter <a href="https://twitter.com/DRMacIver/status/1068084925633650688">thread</a> about conversational interactions, the seed of which was this <a href="https://sambleckley.com/writing/church-of-interruption.html">article</a> - I like to think that over the last few years I’ve become more a member of the <em>Church of Strong Civility</em> than the <em>Church of Interruption</em>.</p>
<blockquote>
<p>DOCTRINES OF THE CHURCH OF STRONG CIVILITY</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Thou shalt not interrupt.
Thou shalt speak briefly.
Thou shalt use physical cues to indicate your understanding and desire to speak.
</code></pre></div> </div>
</blockquote>
<p>Definite food for thought, and a reminder that conversations about how we have conversations are often vital preludes to making conversations themselves productive.</p>
<h3 id="three">Three</h3>
<p>We resumed work on improving our log-aggregation/query platform - our initial assessment of the AWS hosted ElasticSearch was that it was missing key features for our usecase, and that we would be trading off too much configurability for reduced operational overhead.</p>
<p>We’re now looking at Elastic’s own cloud offering and we repeated our investigation workflow that we used for our incident management application trials:</p>
<ul>
<li>Identify use cases and <strong>trials</strong> to run (with input from our stakeholders)</li>
<li>Get our ducks in a row to engage a free trial (i.e. set up infrastructure ready for it)</li>
<li>Commence the free trial and evaluate our criteria</li>
</ul>
<p>My use of the word <strong>trial</strong> rather than <strong>experiment</strong> is deliberate, and comes from our CTO’s most recent fortnightly ProDev all-hands session - paraphrasing <a href="https://2018.continuouslifecycle.london/sessions/keynote-linda-rising/">Linda Rising</a>, there’s no null hypothesis or statistical validation going on, so what we’re performing are trials and not experiments.</p>
<p>This is not a bad thing, but if we don’t have the resources to do proper experimental validation we should at least keep our trials short, effective, and <em>frugal</em>. Our free trial is temporally capped at 14 days, and has no revenue cost, so provided we’re <em>effective</em> with the criteria we evaluate this is shaping up to be a good trial.</p>
<h3 id="four">Four</h3>
<p>We now maintain a number of tools to orchestrate production systems that have historically been un-loved - a Shift developer, Stephen, took it upon himself to spend a bit of time to spike a cleaner version with more user-friendliness.</p>
<p>Shift are lucky: if we scrunched up a post-it and lobbed it a few feet, we’d hit one of our stakeholders.</p>
<p>The original XP book (Extreme Programming Explained) talks about the benefits of having an embedded customer for validating work quickly - we’ll go and user-research stuff that we’re building with the users that will be using them by embedding and watching them use the tool, whether it’s a senior developer, a new hire, or one of our experienced Site-Reliability Engineers.</p>
<p>The tool he spiked, to improve our puppet node management workflow, opens up a lot of opportunities for us to try new technologies. Now that we know there’s desire for the product, we’re umming-and-ahhing about whether we:</p>
<ol>
<li>Keep it in Bash</li>
<li>TDD it from scratch in Python, a language we’re familiar with</li>
<li>TDD it from scratch in Go/Rust, languages we’re <em>not</em> familiar with.</li>
</ol>
<p>We’ll be invoking the <em>Improve, No Change, Worsen</em> workflow (outlined in a <a href="https://blog.probablyfine.co.uk/2018/10/10/improve-no-change-and-worsen.html">previous blogpost</a>) to establish pros and cons of each approach - (2) and (3) are almost certainly slower, but (3) gives us the opportunity to broaden our horizons.</p>
<p>Will (3) be worth the overhead of learning a new language/toolchain?</p>
<p>Stay tuned to find out!</p>
<h2 id="reflections">Reflections</h2>
<p>I’m starting to meditate again, once a day for 15-20 minutes if I can. My mind is full of stuff right now, both at work and in my personal life so I am trying to get better at taking time for myself.</p>
<p>I’m basically being hit in the face with my own oft-repeated quote</p>
<blockquote>
<p>If you don’t take time, or make time, how can you ever <em>have</em> time?</p>
</blockquote>
<p>I finished <em>Killing Commendatore</em> and have downloaded an audio-book of <em>A Wild Sheep Chase</em>, another Murakami book. There’s something relaxing about his prose that takes me out of myself for a bit.</p>
Tue, 04 Dec 2018 00:00:00 +0000https://blog.probablyfine.co.uk/2018/12/04/notes-from-the-week-11.html
https://blog.probablyfine.co.uk/2018/12/04/notes-from-the-week-11.htmlNotes from the Week #10<p>I’m a bit pleased that I’ve managed to keep up weeknoting for 10 weeks now - I’m usually rubbish at building and maintaining habits.</p>
<h3 id="hubble-hubble-toil-and-trouble">Hubble, Hubble, Toil, and Trouble</h3>
<p>Whilst I lead the Shift team, my “first team” in the <a href="https://www.tablegroup.com/blog/thoughts-from-the-field_-issue-9---what-is-your-first-team">Lencioni-sense</a> is the Team of Team Leads. As a result of our highly-collaborative structure, we’re great at building rapport within our own teams as a result of working closely together day-in-day-out.</p>
<p>The Team of Team Leads (or TTL) however don’t spend that much time together, as we’ve got our own teams to be concerned with, but we’re trying to become a better team in the truest sense of the word - one way we’re doing that is to just spend time with each-other and bond, to understand eachothers’ motivations, contexts, and problems.</p>
<p>On Tuesday the TTL took a day-trip to the Science Museum. There was no plan (not even for when we’d take lunch!), forcing us to self-organise.</p>
<ul>
<li>We ended up having a good ol’ wander around the exhibits</li>
<li>I learned some cool facts about foghorns</li>
<li>We took advantage of the Science Museum’s IMAX cinema to catch a short film about a <a href="https://www.sciencemuseum.org.uk/see-and-do/hubble-3d">Hubble repair mission</a> in 3D - absolutely gorgeous and fascinating. (Even in space, there’s no escape from “Just hit it with a hammer until it does the thing”)</li>
<li>We shared trivia about ourselves during lunch, including things like “What’s the strangest dream you’ve had recently?”</li>
<li>We <em>did</em> end up coalescing into separate conversations but they were rich, deep, and left me with lots to think about.</li>
</ul>
<p>All in all, a rewarding and fun day out!</p>
<h3 id="smokin-on-the-docker-of-the-bay">Smokin’ on the Dock(er) of the bay</h3>
<p>Last week I mentioned that we were going to try using containers to speed up the feedback loop of our developed-in-the-open Puppet code. In traditional Shift-style we kept a detailed log of the experiment and its objectives and I’m pleased with how it’s turned out.</p>
<p>We’ve tried dedicated testing solutions like Beaker and Kitchen before, but in the end all we needed was a very simple <a href="https://en.wikipedia.org/wiki/Smoke_testing_(software)">smoke test</a> evaluation loop:</p>
<ol>
<li>Build test image containing SystemD and Puppet</li>
<li>Copy module code into container</li>
<li>Run <code class="highlighter-rouge">puppet apply</code> against a test manifest using the module</li>
<li>Verify that the exit code is 2</li>
</ol>
<p>The reason we test for exit code 2 rather than 0 is that <code class="highlighter-rouge">puppet apply</code> has non-standard exit codes - 2 means <em>changes were applied and there were no errors</em>, whereas 0 means <em>no changes applied</em>.</p>
<p>Our aims, which we fulfilled, are below:</p>
<ul>
<li>Feedback loop for testing and implementing code is less than 5min.</li>
<li>Spin up/down of new container takes less than 1min.</li>
<li>Puppet can apply manifests cleanly.</li>
<li>We can develop in the open.</li>
<li>Provide a potential smoke-test environment for further developing production systems.</li>
<li>Timeboxed. If the working cost of setting up a Docker container for smoke tests was prohibitive we would abandon the experiment.</li>
</ul>
<p>The spin-up/spin-down of the container is actually fast enough that the run-time of the smoke tests are dominated by the application of the manifest (installing packages, etc).</p>
<p>We chose to develop this in the open because we’re being strict about separating configuration from data, and in this case <em>there’s no reason not to</em> - check it out on our <a href="https://github.com/unruly/unruly-puppet#testing">public GitHub!</a></p>
<h3 id="papers-please">Papers, Please</h3>
<p>This week was the paper review deadline for <a href="https://conf.researchr.org/home/icse-2019">ICSE 2019</a>.</p>
<div style="text-align: center;">
<img src="https://blog.probablyfine.co.uk/assets/sylvester.gif" title="Leaving things until the Last Responsible Moment" style="object-fit: cover; width: 60%; max-height: 60%;" />
</div>
<p>I’m honoured to be one of the program committee for the Software Engineering in Practice track and while I can’t talk about the papers that are coming through the system, I will say that there are some absolute bangers this year and I’m looking forward to spending a week in Montréal next year at the conference.</p>
<p>P.S. if anyone has any good food recommendations for Montréal, please let me know!</p>
Mon, 26 Nov 2018 00:00:00 +0000https://blog.probablyfine.co.uk/2018/11/26/notes-from-the-week-10.html
https://blog.probablyfine.co.uk/2018/11/26/notes-from-the-week-10.html