A Good Looking Pipeline

21 June 2014

A friend told me this week that she had to give a presentation to a general audience and that she was planning to reuse the slides she previously built back in May. I think there are some situations where it makes sense, but most of the time, reusing slides is a bad idea. I cringe every time I look at my old slides. I can believe I missed all those flaws flailing their arms at me.

It’s perfectly natural to not be happy with your old slides—you did your best at the time and there’s no such thing as perfect slides, anyway. The problem comes when you forget that a month ago your understanding wasn’t as deep as it is now, your perspective wasn’t as broad, and your experiences weren’t as numerous as they are now. You have never been more qualified to improve your old slides as you are right now. It’s a shame to waste that opportunity.

This is how my friend originally illustrated her pipeline:

It’s easy to sketch a diagram like this one on paper—you should do this right after you’re done writing the script, and just before you translate your sketches into PowerPoint. Once you’re finished drawing, you are ready to ask which parts are important but understated, and which parts are unimportant but overstated.

These are the important points about the pipeline:

The input is a mixture of sequencing reads that come from a sample that contains genomes from multiple species (including human).

You can use sequence alignment to filter out reads that come from the human genome.

Reads that align to each database are combined and classified using Pathoscope (this is the premise of the talk).

The output is a list of species that we think were present in the sample.

These are the secondary points:

Human reads are detected by aligning them against the hg19 version of the human genome and a ribosomal RNA database.

The sequence aligner is Bowtie2

The individual non-human databases include archea, bacteria, fungi and viruses.

Bowtie2 outputs files called out.sam

out.sam files are combined into the merged.sam file

Pathoscope outputs a file called updated.sam`

samtools and genomeCoverageBed are used to calculate coverage for each species

My friend revisited the original slide by creating intuitive illustrations, splitting the pre-Pathoscope part into two slides, and adding useful headings for each one:

(I’m showing the final slide, but she added animations to make individual parts appear separately. Check out the accompanying SlideShare)

The revised version downplayed the secondary points, and highlighted the important ones. Even though it was designed for a general audience, these slides would work just as well for an expert audience (that’s one of the advantages of speaking to the newbies).

Before she went into the Pathoscope part of the pipeline, my friend wanted to explain that different strains from the same species of bacteria have very similar genomes. She summarized the information about two different species in a table:

She then decided to make the slide more visually intuitive and turn the table into an illustration:

The premise of this slide is that, for both species, the genomes of approximately 70 strains have very similar sequences. The title is written in a newspaper headline style that understates this point, so my friend did another iteration. She turned the heading into a full sentence and simplified the core/non-core legend:

Once we have seen the final version, it’s tempting to forget about the awkward intermediates. But these tentative steps are a crucial part of the design process. We can only improve what we’ve already created. Without an initial attempt, there’s nothing to fix.

The last part of the pipeline involves Pathoscope, a method that assigns ambiguously aligned reads to the strain that is most likely to have generated them. This is how my friend illustrated this process a month ago:

This slide overstates unimportant points (Pathogen A is labeled but there is no pathogen B, there is a box around the image, the grey arrow points down, the title doesn’t have context) and understates important ones (there are ambiguous and non-ambiguous alignments, Pathoscope disambiguates alignments by assigning them to the strain with the most unambiguous reads). This is my friend’s recent makeover:

She expanded the process into two slides with clear illustrations, and she added full sentence headings for each one. The after slides are a huge improvement compare to the before slides, but we can simplify them even further.

Pink and black aren’t related colors, but a white fill with a black border could be interpreted as an incomplete black fill. This association suggests a more intuitive way to represent an ambiguous alignment. Also, during the presentation we can just say that “there are two steps in the pipeline”, and use the heading to simply convey the premise of the slide: Align reads to strain sequences.

Showing the before and after in the same illustration makes sense for static figures, but in a presentation we can separate them in two individual slides.

Depending on how important the arrows are, we could also add an intermediate step between the previous slides:

Next time you have to present a pipeline, try this workflow: Sketch it on paper first. Notice what’s important and what isn’t. Break up the pipeline into individual slides. Make sure that each slide highlights the main points and ignores the uninteresting details. Iterate until you can’t recognize your initial idea.

It took longer for my friend to improve her original slides than if she had just reused them, but she also provided more value. The audience was able to understand her better and they will remember what she said for longer. Isn’t that why we give presentations in the first place?