Pivotal Labs » Mark Rushakoffhttp://pivotallabs.com
Agile DevelopmentTue, 31 Mar 2015 15:53:43 +0000en-UShourly1http://wordpress.org/?v=4.1.1Barriers to deterministic, reproducible zip fileshttp://pivotallabs.com/barriers-deterministic-reproducible-zip-files/
http://pivotallabs.com/barriers-deterministic-reproducible-zip-files/#commentsTue, 20 May 2014 23:24:51 +0000http://pivotallabs.com/?p=28869Despite all my Google searching for an existing tool that would build an identical zip file every time for the same inputs, I came up empty-handed. I decided to dig as deep as necessary to figure out what prevents us from creating the same zip file every time for theaaa

]]>Despite all my Google searching for an existing tool that would build an identical zip file every time for the same inputs, I came up empty-handed. I decided to dig as deep as necessary to figure out what prevents us from creating the same zip file every time for the same inputs. My particular use case was creating a zip file using the built-in zip command line utility on OSX.

(Not sure why someone might want reproducible zip files? There are several reasons why someone might want reproducible builds, but a reproducible zip would eliminate one level of necessary trust for anything distributed via zip file, if everyone else could take the same inputs and generate an identical zip. It could simplify testing, too, in some circumstances.)

One of the simplest tests you can do to determine if you are creating an identical zip is look at the checksum of your generated zip file: e.g. zip -r /tmp/foo.zip /tmp/foo; md5 /tmp/foo.zip. If you run that command instantly twice in a row, you might see the same checksum; if you run it by hand twice in a row, you’ll almost certainly get two different checksums. This is a really good indication that the zip file has some kind of embedded timestamp.

If you carefully inspect the internal headers for a zip file generated with a plain zip command and cross-reference those with PKWare’s .ZIP file format specification, you’ll see that your zip file has an “extended timestamp” field. That sounds promising, but because it’s an “extra field” from third parties, the zip specification doesn’t go into detail about the content of that field. One Google search later and you’ll find an article on unzip that indicates the extended timestamp field includes an access timestamp! Since we have to access the file to read its content to be put into the zip archive, that extended timestamp field will change every time. The simplest workaround I’ve found is to pass the -X option to zip, which is supposed to exclude all extra fields.

The -X option will get you pretty far. You should be able to run zip -X -r foo1.zip foo; sleep 5; zip -X -r foo2.zip foo; cmp foo1.zip foo2.zip and see that the foo1.zip and foo2.zip are identical.

However, zip also includes the modification times for all included files. This means that running zip -X -r foo1.zip foo; sleep 5; touch foo/*; zip -X -r foo2.zip foo; cmp foo1.zip foo2.zip will show you that foo1.zip and foo2.zip differ. If the modified times happened at two different minutes, unzip -l foo1.zip; unzip -l foo2.zip will show you the different modification times.

You can do a terrible hack to force all the files in the foo directory to have the same timestamp: find foo -exec touch -t 201401010000 {} + to set the modification times for everything in the foo directory to Jan 1 2014. The zip file seems to be reproducible at that point, but it’s entirely dependent on a known, static timestamp. It would probably be better if individual files had individual, meaningful timestamps (that script is outside the scope of this post).

The last and probably most trivial barrier to reproducible zip files is the order of the contents of the zip file. That is, zip -X foo12.zip foo/1.txt foo/2.txt will always produce a different zip from zip -X foo21.zip foo/2.txt foo/1.txt.

Initially I was surprised at lack of tooling to make reproducible zip files, but after looking into the issues in depth, it seems that the modified timestamp is perhaps the most difficult problem to solve in a generic way that suits most people’s needs. Also, my cursory search of zip libraries in both Ruby and Go did not turn up any libraries that make it obviously easy to rewrite the internal headers in zip files. That being said, if you have particular needs for reproducible zip files, I hope this post helps you understand what modifications you need to apply to a zip file to address the “less deterministic” data in it. If you write an open source tool to accomplish this, please let us all know about it in the comments.

]]>http://pivotallabs.com/barriers-deterministic-reproducible-zip-files/feed/0git rebase vs. git merge: an agile perspectivehttp://pivotallabs.com/git-rebase-vs-git-merge-an-agile-perspective/
http://pivotallabs.com/git-rebase-vs-git-merge-an-agile-perspective/#commentsFri, 17 May 2013 20:10:59 +0000http://pivotallabs.com/?p=19310At Pivotal Labs, we’ve been using Quandora for about 6 months as an easier way to archive and discover discussions about the hows and whys of consulting and software engineering here. Earlier this week, I asked my colleagues: There are some git workflows that would have you regularly work inaaa

]]>At Pivotal Labs, we’ve been using Quandora for about 6 months as an easier way to archive and discover discussions about the hows and whys of consulting and software engineering here. Earlier this week, I asked my colleagues:

There are some git workflows that would have you regularly work in feature branches and then merge back into master only when the feature is ready for acceptance. However, on every project I’ve worked on at Pivotal, we have preferred to rebase and commit to master regularly.

The more code diverges, the more difficult it is to integrate. If you want continuous integration, it’s a lot easier to do so on one branch, not many. It also drives out stories that are small, have the smallest actionable work, and are easy to accept. This leads to tight feedback loops.

We place priority on CI and small stories. This makes it easier to work on master – it’s not a big deal if it is accidentally broken, it’ll be easily fixed or reverted.

In my experience, we still do use topic branches when appropriate. E.g., in the middle of a story at the end of a day, or for a bigger feature that you don’t want to do in one commit, but would break mainline (master) with the intermediate commit. On my project, in this case, we usually merge –squash –no-commit onto master, to squash multiple commits on the topic branch into one commit on master.

Even in git, branching is still painful. The longer a topic branch lives, the harder it is to merge. Yes, you can rebase often, but that means you have to rewrite history to push the rebased changes to the server. This can cause confusion if the branch lives long and is worked on by multiple pairs or on multiple machines. So, it’s usually less net effort to just work on master, because we have good tests and can trust CI to quickly tell us if there’s any glaring logical merge conflicts that were missed. On teams without CI that rely solely on manual QA, this is much more of a risk and more expensive.

I’ve seen both patterns at Pivotal. Teams that rebase tend to be small and haven’t had a production release. Large teams have a quickly changing repo and feature branches make the history more readable. And teams that have released use feature branches to relegate incomplete code to a future release. They can also (in theory if not practice) easily roll-back an entire feature branch if something goes wrong in production.

Of course, as Rasheed mentions, trying to wrangle large unreleased changes causes all kinds of problems. Better to get faster feedback with smaller releasable features. If your team still feels it needs to have incomplete features in master, or needs to disable misbehaving pieces of the app, read up about feature switches and look into a tool like rollout or flipper.

So, it seems that in general, we tend to prefer rebasing because it helps facilitate some core concepts of agile software development:

Continuous deployment

Tight feedback loops

Small deliverables

Thanks to Rasheed, Chad, and Jacob for helping provide some great content!

]]>http://pivotallabs.com/git-rebase-vs-git-merge-an-agile-perspective/feed/1An attitude shift as we approach productionhttp://pivotallabs.com/an-attitude-shift-as-we-approach-production/
http://pivotallabs.com/an-attitude-shift-as-we-approach-production/#commentsMon, 29 Apr 2013 14:59:44 +0000http://pivotallabs.com/?p=18650I had the good fortune of attending a workshop about responding to production incidents, led by the folks behind Blackrock 3. I plan to share, over several posts, what I learned with the community at-large and to apply what I learned within the Cloud Foundry team — we’re going liveaaa

]]>I had the good fortune of attending a workshop about responding to production incidents, led by the folks behind Blackrock 3. I plan to share, over several posts, what I learned with the community at-large and to apply what I learned within the Cloud Foundry team — we’re going live in the very near future, and we are taking incident management and response very seriously.

One of the most important things I learned from the workshop was understanding the attitude and mindset necessary to even begin to handle an incident.

A running analogy in the workshop was that your job (if not your business!) actually operates in two “modes”: peacetime and wartime.

Peacetime operations are what most of us in software development are used to: business as usual, relatively low pressure and stress, ample time to make difficult decisions, etc.

When it comes to incident response, that business-as-usual attitude needs to change immediately. You will be working under pressure; you will have to make difficult decisions, choosing from non-optimal options. There isn’t time for panic, and there isn’t time to procrastinate. You’re at war now.

As the primary responder to an incident, you will carry a burden. It’s not just your immediate team that has entrusted you with this responsibility of managing this incident — but also your entire organization, your customers, and your company’s stakeholders. You will need to triage incidents quickly. If you won’t be able to solve the problem on your own, you need to be able to immediately contact the right people to assist you. You will need to follow your incident response plan.

You should have buy-in from the whole organization, from the top down, for your incident response plan. If any incidents are routine, you should be able to handle them with minimal effort. Your plan will not cover everything that will ever come up, but it should be flexible and robust enough to ensure that small problems don’t grow into catastrophes due to the way the incident was handled.

Incidents will occur, and not all of them will be handled smoothly. Your team and organization need a culture of trust. Responders must not be afraid of the consequences of full disclosure, and they must not be paralyzed in fear of consequences of making a poor decision during an incident response. You need to be able to objectively reflect on your process in handling an incident so that you can handle future incidents even better.

Not everyone will share my point of view, but I’m excited to fill this role on the Cloud Foundry team. I’ve been coding for something over a decade now, and while it’s always been fulfilling in its own way, the biggest adrenaline rush I’ve ever had on the clock was an unannounced fire drill. That’s going to change now; it feels like moving beyond just a desk job. Work is going to involve some excitement, fear, and panic, and I’m ready for it.

]]>http://pivotallabs.com/an-attitude-shift-as-we-approach-production/feed/0Use fold to wrap long lines for an easier diffhttp://pivotallabs.com/use-fold-to-wrap-long-lines-for-an-easier-diff/
http://pivotallabs.com/use-fold-to-wrap-long-lines-for-an-easier-diff/#commentsFri, 01 Mar 2013 21:13:31 +0000http://pivotallabs.com/?p=15741We had two versions of a not-really-intended-to-be-human-readable file that were only slightly different, and we wanted to know how they were different. The lines were several hundred characters long, so when we diffed the files, we saw a basically useless output of @@ -1 +1 @@ - bf659d7a2e0e45223095367c561526f8a10311459433adf322f2590a4987c423e55afe6e88abb09f0bee39549f1dfbbdfba99e39c0b70a7a656d07061ee113676f0d6db25d88c8034db6d9d8beec42daaaf8818273d9436fca8f442cdb6245c285c33638 + bf659d7a2e0e45223095367c561526f8a10311459433adf322f2590a4987c423e55afe6e88abb09f0bee39549f1dfbbdfba99e39c0b70a7a656d07061ee113676f0d6db25d88c8034db6d9d8beec42daaaf8818278d9436fca8f442cdb6245c285c33638aaa

]]>We had two versions of a not-really-intended-to-be-human-readable file that were only slightly different, and we wanted to know how they were different. The lines were several hundred characters long, so when we diffed the files, we saw a basically useless output of

]]>http://pivotallabs.com/use-fold-to-wrap-long-lines-for-an-easier-diff/feed/0Building identical gemshttp://pivotallabs.com/building-identical-gems/
http://pivotallabs.com/building-identical-gems/#commentsFri, 22 Feb 2013 21:18:17 +0000http://pivotallabs.com/?p=15539We ran into a problem where we were running `gem build` on identical input files and the built gems had different checksums; that is to say, if you run `gem build` twice in a row, the resulting `foobar.gem` files will not be identical. A .gem file is actually a taraaa

]]>We ran into a problem where we were running `gem build` on identical input files and the built gems had different checksums; that is to say, if you run `gem build` twice in a row, the resulting `foobar.gem` files will not be identical.

A .gem file is actually a tar file (not compressed) containing two gzipped files (manifest.gz and data.tar.gz). What’s happening, as far as we can tell, is that gzipping a file embeds a timestamp somewhere in the file — here’s a gist of a Bash session that demonstrates just this idea:

So how do you build identical gems from the same input? As far as we could discover, that is not supported through any `gem` commands. To normalize a gem, you would have to untar the .gem file and then decompress the files inside; then you can do a full comparison of those contents against another .gem file that went through the same process.

]]>http://pivotallabs.com/building-identical-gems/feed/0How to simultaneously display and capture the output of an external command in Rubyhttp://pivotallabs.com/how-to-simultaneously-display-and-capture-the-output-of-an-external-command-in-ruby/
http://pivotallabs.com/how-to-simultaneously-display-and-capture-the-output-of-an-external-command-in-ruby/#commentsSun, 17 Feb 2013 19:12:09 +0000http://pivotallabs.com/?p=15389There are many ways to run external commands in Ruby: surround with backticks, enclose in %x{}, call Kernel#system… None of those approaches let you display the output of the command in real time while simultaneously capturing the output. Here’s a gist showing how to use IO.popen to capture output, displayaaa

]]>There are many ways to run external commands in Ruby: surround with backticks, enclose in %x{}, call Kernel#system…

None of those approaches let you display the output of the command in real time while simultaneously capturing the output. Here’s a gist showing how to use IO.popen to capture output, display output, and check exit status:

]]>http://pivotallabs.com/how-to-simultaneously-display-and-capture-the-output-of-an-external-command-in-ruby/feed/0What happened to stdout on CI?http://pivotallabs.com/what-happened-to-stdout-on-ci/
http://pivotallabs.com/what-happened-to-stdout-on-ci/#commentsThu, 14 Feb 2013 16:39:43 +0000http://pivotallabs.com/?p=15340We were struggling for a bit yesterday trying to figure out why the few puts statements in our tests weren’t being displayed in Jenkins’ console output. It turns out the ci_reporter gem that we were using (so that Jenkins could parse our test results) swallows stdout and stderr by default —aaa

]]>We were struggling for a bit yesterday trying to figure out why the few puts statements in our tests weren’t being displayed in Jenkins’ console output.

It turns out the ci_reporter gem that we were using (so that Jenkins could parse our test results) swallows stdout and stderr by default — unless you set the CI_CAPTURE environment variable to the string "off", like it tells you to do in the readme.

That was a confusing default setting. Since my pair and I weren’t involved in setting up Jenkins, and we didn’t install the ci_reporter gem, we assumed something was misconfigured in Jenkins before we started digging through the code.

]]>http://pivotallabs.com/what-happened-to-stdout-on-ci/feed/0Making life easier after your organization requires two-step authenticationhttp://pivotallabs.com/making-life-easier-after-your-organization-requires-two-step-authentication/
http://pivotallabs.com/making-life-easier-after-your-organization-requires-two-step-authentication/#commentsFri, 25 Jan 2013 21:37:40 +0000http://pivotallabs.com/?p=11834My normal workflow at home used to be that I would use one session of Chrome for normal internet browsing with my personal accounts, and I would simultaneously use an incognito session for work email and other work-related accounts. However, since Pivotal Labs is now requiring two-step authentication for ouraaa

]]>My normal workflow at home used to be that I would use one session of Chrome for normal internet browsing with my personal accounts, and I would simultaneously use an incognito session for work email and other work-related accounts. However, since Pivotal Labs is now requiring two-step authentication for our Google Accounts, this would mean that every time I open an incognito window to check my work email from home, I would have to wait for a text message on my phone and enter the code.

The first step in making this easier is to install the Google Authenticator app, available for iOS, Android, and Blackberry (don’t forget to follow the setup directions in that post too). Now, instead of waiting for a text message, and then having to delete that message from my inbox (for some reason the message frequently comes from a different phone number), you can just launch the app on your phone and transcribe a 6-digit code into the second authentication step.

The other step is to configure Chrome for multiple users. This lets you have multiple, isolated Chrome windows at the same time. So now I can still use one session for my personal accounts while simultaneously having an open session for my work accounts, and I can choose “remember this computer for 30 days” because the session won’t get blown away when I close the last window. Here’s the screen where you can configure a user’s name and icon:

If you have used incognito mode, then you’ve most likely noticed that the incognito session windows have a spy-looking icon in one of the top corners:

Once you have set up multiple users in Chrome, then your normal windows will have an icon of your choosing in the same corner, so you can quickly identify which session that a window belongs to. Clicking on that icon will display a menu where you can choose to open a window belonging to another user.

I hope this helps make living with two-step authentication easier for you. If you have any other tips, let’s hear them in the comments!

]]>http://pivotallabs.com/making-life-easier-after-your-organization-requires-two-step-authentication/feed/1Headphones in a pair programming environmenthttp://pivotallabs.com/headphones-in-a-pair-programming-environment/
http://pivotallabs.com/headphones-in-a-pair-programming-environment/#commentsFri, 18 Jan 2013 21:16:13 +0000http://pivotallabs.com/?p=11787We strive to always pair program here at Pivotal, but occasionally there will be an odd number of people on a team and one person will not be pairing. Sometimes, the solo developer will put on some headphones and listen to music while they code. I posed a question toaaa

]]>We strive to always pair program here at Pivotal, but occasionally there will be an odd number of people on a team and one person will not be pairing. Sometimes, the solo developer will put on some headphones and listen to music while they code. I posed a question to my peers recently:

Is there any harm in letting the soloing team member wear headphones and listen to music?

It seems that the main argument in favor of listening to music as a solo developer is that it makes it easier to focus on what you’re working on. I, too, have found that as a solo developer, it’s very difficult to not pay attention to nearby conversations.

As it turns out, that’s only one part of the larger picture. We need to consider how this affects the whole team.

When you stop listening to nearby conversations, you are missing out on opportunities to help adjacent pairs. People seem to have a tendency to notice important words even when they aren’t paying full attention to a conversation — a nearby pair might be debating whether they should be using a FooWidget or a BarWidget, and perhaps you can answer all their questions since you were involved in writing both of the original implementations. If you can save that pair time by overhearing the conversation and answering their questions, then you are saving the client money and delivering better value to them.

If you’re working on something as a solo developer that requires you to focus so hard that you need to block out audible distractions, then perhaps that story is better suited to a pair than a solo developer, or maybe the scope of that story is too wide.

At Pivotal, we take pride in having an open workspace. To keep all lines of communication open, we don’t segregate people into cubicles or offices with closed doors. However, wearing headphones puts up a barrier that says “Don’t bother me.” The solo developer should be available to support the rest of the team however necessary, and support in a pair programming environment is not best accomplished from a distance.

I think that listening to music as a solo developer in a pair programming environment may offer some short-term psychological benefits to the solo developer, but overall it is detrimental to the team.

(Credit goes to Pivotal’s Adam Milligan for an original explanation of many of the reasons not to wear headphones while soloing.)

]]>http://pivotallabs.com/headphones-in-a-pair-programming-environment/feed/0Dealing with issues in third-party librarieshttp://pivotallabs.com/dealing-with-issues-in-third-party-libraries/
http://pivotallabs.com/dealing-with-issues-in-third-party-libraries/#commentsFri, 09 Nov 2012 21:06:00 +0000http://pivotallabs.com/dealing-with-issues-in-third-party-libraries/Sometimes, despite your best efforts and despite following the documentation, the third-party library you chose for a task just won't cooperate. Maybe it crashes, or maybe it produces the wrong output, or maybe you didn't check the license up-front and you're just now finding out you can't use that library. What can you do?

]]>Sometimes, despite your best efforts and despite following the documentation, the third-party library you chose for a task just won’t cooperate. Maybe it crashes, or maybe it produces the wrong output, or maybe you didn’t check the license up-front and you’re just now finding out you can’t use that library. What can you do?

An ounce of prevention

Before you run into this problem of the failing library, one of the best ways to defend against unpredicted shortcomings in external libraries is to write your own abstraction layer around it. That way, if you do need to swap out the library, you can only make small changes to your wrapper instead of doing a search-and-replace across your whole application.

Is it always appropriate to wrap every third-party library? No. For example, if you need to calculate a particular type of checksum in only one spot in your application, you probably don’t need to abstract that library. Likewise, most web applications wouldn’t gain anything by abstracting jQuery or Zepto — it is assumed that those libraries are a hard dependency in the app, like Rails might be on a backend project.

The intuition of when wrapping a library is appropriate will come with time and experience; but as a rule of thumb, the more places you call that library in your application, the more appropriate it is to abstract away the library’s API.

Find an alternative, equivalent library

Most libraries out there have at least a couple alternative implementations from different parties. Try searching for jquery calendar plugin, for instance. The current top result on Google is a list of 30 different calendar implementations.

If you’re on a spike and figuring out which library to use for a particular piece of functionality, swapping out a library is usually easy and obvious. However, if much of your application already depends on a particular library, swapping out that library may potentially be very painful if you haven’t already written an abstraction layer. And if you don’t yet know which alternative library is most appropriate, then reimplementing the internals of the wrapper a couple times may very well be less painful than going through multiple iterations of search-and-replace throughout your entire application.

Implement the library’s functionality yourself

Reimplementing a library yourself can be very expensive, depending on the library. However, you gain the advantage of having a library that does exactly what you need. And if your client agrees to it, you can even consider releasing your implementation into the wild as another open-source solution.

The middle ground: find a lower-level library

If you do need to reimplement a library, you don’t necessarily need to rewrite the whole implementation from scratch. The trick here is finding an appropriate lower-level library.

Here are a couple examples of what I mean by utilizing a lower-level library:

You’re using a Javascript charting library that supports scatter plots and bar charts, but you need logarithmic scales and pie charts. D3.js gives you the tooling you need to draw practically any kind of chart.

You are using a framework that generates sprite sheets from separate images and appropriate CSS to use those sprite sheets, but it doesn’t update the CSS output, even when you force it to recompile. Drop down to a tool like sprite-factory where you precisely control the generated output as a function of sprite position and sprite sheet dimensions, etc.

It’s easy to look at those examples and say “Of course! I wouldn’t dream of implementing a way to generate sprite sheets without using an existing library.” However, it’s also very easy to end up writing large features from scratch without considering existing libraries that don’t completely solve your problem but only get you partway there.

Stay on your toes when it comes to integrating with other libraries. Working around an occasional shortcoming is well within “normal” usage of a third-party library, but if that library regularly gets in your way, hopefully you don’t find yourself completely locked in to that library.