It's complex

Monthly Archives: November 2017

The previous episode of the how-to-write-a-scientific-paper series can be found here; if you are new to the series and want to start from the beginning, click here. In this rather long one, we begin to move from theory to practice and talk about how to write the Introduction with the help of a four-paragraph template.

In terms of structure, the introduction of a scientific paper should follow an hourglass shape (broad-narrow-broad) but emphasize the context—the top of the hourglass—more than the resolution of the story.

A good introduction section begins with a paragraph that sets up the broad scientific context. This paragraph is important: it is the part of your paper that is most likely to be read, in addition to the abstract. In our now-familiar film script analogy, the role of the first paragraph is that of Setup. It introduces the world and the characters; it makes the reader familiar with the concepts and ideas that define the topic of the paper. The first paragraph also paves way for the coming paragraphs: it is the first step on the path to the sentence where the exact research problem of the paper is stated. To get the reader interested, a well-written first paragraph should already point out a broader gap in knowledge that the paper’s results aim at filling.

After the broad context has been introduced in the first paragraph, the scope of the introduction should narrow down. The next one to two paragraphs should move to the Confrontation phase: they should frame and motivate the problem tackled by the paper. They should also cite relevant literature to provide context and connect the paper to the streams of thought that together form your field of science. Then, the exact research question addressed by the paper should be explicitly and clearly stated. This sentence that reveals the question is the climax of the introduction, its highest point of excitement; it is also the narrowest point of the top-heavy hourglass.

What happens next varies slightly, depending on the format.

For short papers and the letter format, the introduction has to be wrapped up rather quickly; it is common to summarise the main finding and tell how it was obtained with one or two paragraphs before moving on to detailed results and methods.

For longer papers, it is common to provide a mini-review of the state of the art, an account of what others have done in the general vicinity of your exact research question. This can be followed by a condensed account of your approach to the problem—your experiments, methods, or theoretical points of view—followed by a discussion of your main findings. Note, however, that there are “old-school” traditions of scientific writing where the results are not discussed in the introduction at all: the approach is, but the outcome is not. Whether such a spoiler-free introduction is mandatory, expected, or grudgingly allowed depends both on your field and your journal. Finally, the introduction of a long paper often finishes with a map of the paper, an outline of what is to come: “In Section X, we will discuss…” and so on.

If you pick some of your favourite scientific papers and analyse their introductions, taking the time to understand the role of each paragraph, you will see that they almost always follow variations of the above arc. It is even possible to try to cluster papers by their type of variation—how many paragraphs are there before the key problem is stated? One, two, three? I’ve seen PNAS papers that move from the broad context to the exact research question in one long paragraph, but this is an exception rather than the norm. I would also rather split such long paragraphs. In any case, there is something of a formula, whose exact details depend on the format and length of the paper and the writer. The paper’s topic and its familiarity to the readers of the journal also play a role: research questions that are obvious to the intended audience do not require lengthy explanations, but new points of view or unexpected questions might.

So because there is a formula, let us use it as a template to write against, or as a starting point. Following a good formula makes writing easier. To this end, I will cover a paragraph-level template of the introduction section of a scientific paper that I often start with.

The aim of the template is to help you to develop a paragraph-level outline for your introduction. Developing an outline is essential before writing entire sentences—as I keep repeating, it is better to plan first and write afterwards. After all, how can you write if you do not know what to write? So, use the template to plan your introduction. For each paragraph, make a note of the topic and the point of the paragraph. You can list a few citations if you can already think of them. Also, do consider how to begin and end the paragraph. While we are not writing entire sentences yet, think about the points that the first and last sentences make—and by all means, if you wish to sketch these sentences, do so. The first and last sentences are the power positions of the paragraph. They have the biggest impact. Use them wisely.

This template is well suited for letters and short papers. It proceeds to the exact research question and the main conclusion of the paper rather quickly—the whole introduction is just four paragraphs long. It shouldn’t, however, be too difficult to expand the template for longer papers: just use more than one paragraph for each topic, and add an outline of the paper to the end if you wish. It is also perfectly possible to squeeze this structure into two paragraphs: just merge the first two and last two. The first two paragraphs provide context and lead to stating the research question; the third paragraph elaborates on the question and how it was approached, and the fourth paragraph states the main conclusion of the paper. The narrowest point of the hourglass is immediately after the second paragraph.

The first paragraph of the introduction is the opening and the so-called lede (it is really spelt that way; more on ledes in the next post); it defines the research question in broad terms and triggers the curiosity of the reader. It provides background—what is already known—and perhaps a glimpse of the knowledge gap, the unknown. In terms of plot structure, the first paragraph deals with the Setup and often the Confrontation as well: it introduces the world, the key characters, and an open problem, and makes the reader want to know what happens to them in the paper. At the same time, the first paragraph identifies your intended audience: readers who are interested in this particular world and its inhabitants.

The first sentence determines the topic of the paragraph and sets expectations on the contents of the whole paper. Avoid the easiest way of entry: it is tempting to use the (too) common opening where you first tell that your research topic has become important in the recent years because of this and that. I’ve done this far too often and I promise to avoid it in the future. There are more exciting ways to begin your story! Say something powerful. Move directly closer to where the gaps in knowledge are—do not begin with a long account of what is well-known already. You can fill in the details later.

After a strong beginning, you can continue the paragraph by giving a short overview of the state of the art, of what has been discovered already. This involves citing a number of earlier works; the aim, however, is not to provide an account of everything in the field—that’s what review papers are for. Rather, you should choose a handful of citations that provide context for the research problem that you address, and that at the same time connect your work to the broader progress of your field. Do cite your own work too, if relevant to the problem. This mini-review of what is already known can fill the rest of the first paragraph. You can describe past research in chronological or in topical order; often, what works best is a funnel structure, where you move closer and closer to your actual problem with every sentence.

The first paragraph is made stronger if there is a contrast, a sentence that says “Despite all this, we do not yet fully understand X…” or “However, the role of Y remains an open question” or similar. This sentence can conclude the paragraph, or it can lead to one or more concluding sentence(s) that, say, discuss why it would be important to solve the problem or explain the problem in more detail. Note that the issue that provides contrast, the “not-understanding-X”, doesn’t have to be the exact research question that your paper deals with. It can be something bigger—the broader motivation for your question.

The last sentence of the first paragraph should lead to the second paragraph in a natural way. The above way of contrasting knowledge with the lack of knowledge provides an easy bridge. Your second paragraph can then begin by addressing whatever device you used for contrast—the knowledge gap, a lack of studies, or a lack of consensus on some matter. As an example, you can begin the second paragraph with a sentence that tells why X has remained an open problem. This is rather effective.

It is, however, also perfectly OK to structure the first paragraph simply so that each of its sentences just adds detail and depth to the point made by the first sentence. The first paragraph does not have to end with a cliffhanger. If your first paragraph follows this structure—framing the topic and then providing further details—the gap between the first and the second paragraph can be used to switch to a close-up point of view, to zoom in on your problem.

The second paragraph narrows the scope from the broad setup of the first paragraph and moves into the specific topic of the paper. In the second paragraph, the plot advances from Setup to Confrontation. Its aim is to get to the exact research question which is stated at the end of the paragraph. The second paragraph’s job is to point out the gap in knowledge that the paper aims at filling, through argumentation and illustration, and with the help of carefully chosen citations that point out the existence of the gap. These citations should emphasise what is not known over what is known—use the known to highlight the unknown. This gap in knowledge can be familiar to the scientists in your field—an open problem that most experts recognise—or it can be a hole in the knowledge that no-one has noticed yet. Except you.

The first sentence frames the topic of the paragraph, just like with the first paragraph; this is, by the way, true for all paragraphs, and we shall talk more about priming reader expectations later. This first sentence comes with the additional constraint that it has to seamlessly fit to whatever concluded the first paragraph, as discussed above. Here are some common devices that help to achieve this. If the first paragraph focuses on known things and does not pose a question, the second paragraph can begin with a contrasting statement or a question: “However, …” If the first paragraph concludes with a question, the second paragraph can directly continue from there: why is this question important, why hasn’t it been solved, or what approaches might be feasible for answering it. Again, it may be that this question is broader than the specific question that you have studied—in this case, use the second paragraph to move from the broad question to the specific question, motivating why answering it is important. If others have tried to tackle the question before you, the next sentences should tell how they have approached the problem, what they might have missed, and how your point of view relates to this existing body of knowledge.

Then, at the end of this paragraph, the research question addressed by the paper is explicitly and clearly stated.

At the beginning of the third paragraph of the introduction, the point of view moves from what others have done to what you have done. The third paragraph tells how you have approached the research question. In terms of the storyline, the third paragraph is about the action that takes place between Confrontation and Resolution. Now things finally start happening. The narrowest point of the hourglass has just been passed (it is exactly between paragraphs two and three).

The first sentence of the third paragraph tells what you have concretely done to answer the research question. It may begin with a more concise and focused formulation of the question. Examples: “To this end, we have carried out an experiment where…”, or “In this paper, we investigate the relationship between X and Y with the help of…” If you have formulated your research question as a hypothesis, state this hypothesis in the first sentence. A hypothesis is a strong beginning for the paragraph, so even if you haven’t formulated your problem as one, it might be useful to try to do it. While a lot of research is not about hypothesis testing, a clear hypothesis can be a powerful device for the narrative.

After reformulating the question, the rest of the third paragraph tells more about your approach. If you have designed and carried out an experiment to answer the research question that was made explicit in the second paragraph, tell about this experiment. If you have figured out a new theoretical approach to the problem, explain this approach. If you have collected and studied tons of data with new computational approaches, tell about the data and the methods. But stick to the point: we are writing a paragraph for the Introduction, not for Methods. There will be time to fill in the details later.

If needed—and if there is space—the third paragraph can be long; it can even be split into several paragraphs.

Finally, the fourth paragraph of the template moves from your approach to your findings. It (briefly) reveals the outcome of your work. As it is about the Resolution of the story, it is something of a spoiler—but everyone knows your ending already if they have read the abstract, so don’t worry.

I often keep the fourth paragraph short for maximum effect. It summarizes the key findings so that a busy reader can stop here, perhaps to return to the details later; yet it leaves enough unsaid to whet the reader’s appetite. Also, the brevity of the paragraph provides a nice contrast with the lengthy third paragraph; a short paragraph gives the impression of weight and importance.

I hope this introduction template is useful to you. Up next: a post on ledes and strong first sentences.

Recap: we are now at a stage where you have developed a storyline for your journal article, and this storyline has been condensed into the abstract of the paper. You have some figures and perhaps some schematics, categorized according to their role in the story (see the previous post). You have written draft versions of figure captions. Now it is time to start outlining the different sections of your paper. First, we will talk about how to write the introduction of a scientific paper.

Every story has a beginning and an end, and the Introduction is the beginning of the story that you are about to tell.

A good, well-written Introduction does several things: it introduces the reader to your problem and motivates the problem by reviewing relevant research. It introduces schemas and concepts used in the paper. It points out gaps in the existing knowledge that need to be filled for solving the problem. It defines the exact research problem that the paper addresses, and tells how your research has solved the problem or part of it. It shows how solving it contributes to the big picture. It identifies your reader—who should read the paper?—and makes her so curious that she cannot stop reading. It makes her excited about your work. It makes her want to know more.

In terms of our already-much-abused film script analogy, the Introduction takes care of both Setup and much or all of the Confrontation. This section already provides a glimpse of the Resolution of the story too. It introduces the world and the key characters of the story: the problem area and its important concepts. Remember: the reader will only want to read on if she cares about your world and its inhabitants and the problem that they are facing.

To entice the reader, the Introduction should emphasize the question, not the answer. It should not focus on what you have done, but on why you have done it and what follows from it. There should be an engine for the story, an important question, a need to know. This is what drives the story and whets the appetite of your reader. Curiosity is a strong emotion: trigger it with your introduction, and you have a reader.

Sidetrack: <nerdspeak>The Star Wars prequels failed because there was no big question! Everyone knew that Anakin Skywalker would become Darth Vader; how that would happen was mildly interesting at best. Boring! But at the time of writing this, I do not know what will become of Kylo Ren. And I want to know. </nerdspeak>

How to ask a strong question in the introduction? How to frame the gap in knowledge that needs to be filled? Of course, in a perfect world, your research has had a strong, clear question from the very beginning, and the knowledge gap is obvious. Then, just describe it. Perhaps you have chosen a research problem that everyone knows is important, say, how to solve X. You might even be the first one to have solved X—but this rarely happens because obvious problems are to researchers what a bowl of milk is to an alley full of cats. In any case, if your problem is well-known, you can be brief; if there have been earlier, not-so-successful attempts, or if there have been ideas floating around on how to tackle the problem, you can talk about these in the Introduction.

But, almost always, things are not this straightforward, and you need to think a bit harder about how to frame your question. It might even be that you are not entirely sure of what the question is. Perhaps you have started somewhere, but then along the way, you have noticed that the question you were asking was not the right one or the most important one. Then your research led you elsewhere, and now you are trying to figure out where. Perhaps the importance of your question is not obvious at all, except to you: then you need to tell the rest of the world why the question matters. Perhaps the question that you ask is something that no-one else has ever thought of—perhaps it is your question. If so, then this is a good thing: in my view, science is driven by questions instead of answers, and good questions are rarer than good answers.

When your research aims at answering a nontrivial question that requires a bit more motivating, a good strategy is to start the Introduction with something broad and more familiar, and then gradually move on to your new uncharted territory. All questions are related to bigger questions; begin with a big question and use it to frame the problem that you are solving.

In the next posts, I’ll talk a bit about the structure of the introduction (I’ll provide a template) as well as the importance of the first paragraph and, in particular, the first sentence.

At this point, we have covered establishing the focus of your scientific paper: you should already have a clear vision of what your paper is about, and the essence of this vision should be encapsulated in its abstract. You should also have the necessary ingredients at hand: the results to be presented in your paper together with ideas for schematic diagrams, organised into film-script categories according to their function and role in the story (Setup, Confrontation, Resolution, Epilogue).

The next step is to expand the storyline laid out by the abstract and to outline the different sections of your paper. This begins with choosing what sections make up your paper. Depending on your target journal, you may need to follow strict guidelines—the commonly used Introduction–Methods–Results–Discussion structure for instance—or to come up with a structure of your own. Even for short letter-format papers that may or may not have subheadings, it pays off to have a clear idea of what goes where. Usually, this is not a difficult task: all scientific papers begin with an introduction and end with a discussion, even if these span just a few paragraphs, and the results are sandwiched in between. Methods may be explained before the results, or after the discussion as an appendix of sorts (like in Nature and other glossy magazines).

What is more involved is choosing the order of presentation within the section structure. Here, a solid, tried-and-tested approach is to begin with the figures and their order of appearance. If you have followed the approach of this blog, your figures already come with handy labels (Setup, Confrontation, Resolution, Epilogue) and therefore you already have a good overall idea of their order. If your paper has to follow the standard structure, schematic diagrams and result figures will generally be placed in the Results section, so that figures of the Setup category come first and those of the Epilogue come last; schematic diagrams of the Setup category are an exception as they may belong to Methods or even Introduction. In any case, you’ve already done much of the work before you have even begun outlining the main text: the categories of the figures mostly determine their order. What remains to be done is to choose the order within each section in which the results and schematic diagrams are shown: what figure leads to the next?

The order of figures should tell a clear story so that each builds on the previous ones. You can use a multi-panel figure to tell a self-contained part of the story, a miniature story arc. You can combine, for example, a schematic diagram that explains your experiment (Setup), some basic statistics of your data (Setup), and a result plot or two that contains an unexpected finding (Confrontation). This mini-story multi-panel figure is a technique that I often employ in letter-format papers where the story has to move fast and get to the point quickly; it already brings the story close to its Resolution and the key result.

When you have chosen the figures, write a draft version of the caption for each figure. You may need to revise the captions later; at this stage, you may still be unsure of issues like notation and nomenclature, so don’t pay too much attention to details yet. However, try to write the figure captions so that they are self-consistent enough for a hasty reader to understand most of your paper just by glancing through the figures. After all, this is exactly what a great many readers do; this is what the editor of your journal does too, before deciding whether your paper is worth a closer look or deserves to be rejected outright. How long and how self-consistent the captions should be depends, again, on your journal; in some of the letter-format top-tier journals, captions tend to be very long, while in the lesser journals where we mere mortals publish, figures are discussed at length in the main text and therefore the captions can be shorter. In any case, please make sure that your caption tells what the reader should learn from looking at the figure. A caption that only tells that here we see Y plotted as a function of X is not enough; it is redundant if you have remembered to label your axes in the figure already. Always tell the reader what the message of the figure is.

Because much of the story will be told by your figures, let us talk about figure quality for a while. Figures are tremendously important; those who only skim through the paper won’t see much else. Figures make the first impression and first impressions matter. Clear, high-quality figures with a professional look tell that a lot of effort has been put into the paper, and the reader is more likely to trust its contents. Amateurish-looking figures with a colour scheme that looks like PowerPoint in the 1990’s leave the reader wondering if the results are of the same dubious quality.

So do make sure that your figures look good. How to do this? First, learn the ropes of whatever program you use to generate your figures, whether it is a Python or R library, or a stand-alone piece of software (like Gnuplot that has been around since the dawn of man; it will probably outlast even cockroaches once mankind is no more). In particular, learn how to change fonts, how to increase or decrease font sizes, and how to use proper LaTeX-type fonts wherever appropriate. Learn how to choose and manipulate colours and colour schemes and symbols and shadings. Learn how to produce figures of chosen dimensions, so that you can later assemble them into multi-panel plots of your choice and combine them with schematic diagrams. Learn to match figure sizes to your target journal’s column width; not having to scale the figures takes some guesswork out of choosing font sizes (see below).

Second, do learn to use a vector graphics software to post-process your figures (and do learn the difference between bitmap images—they are made of pixels—and freely scalable vector images, made of lines and arcs and Bezier curves). At the time of writing, the industry standard (design industry, that is) would be Adobe Illustrator; there are many free alternatives such as Inkscape. With a vector graphics editor, it is easy to assemble multi-panel figures that contain schematic diagrams (drawn with the same editor) and result figures saved in vector formats (PDF, SVG). You can also add text, arrows, indicators, and so on, as well as retouch your result plots, changing line widths, colours of symbols, or their overall appearance. Often, this is much faster than trying to get everything right when producing the plots.

A few words on the layout: always align things—nothing spells “I am being careless” more clearly than subplots and schematics that are not neatly lined up (it takes just a few seconds to do this). Use white space properly: leave enough white space so that things can breathe, but don’t leave too much white space so that the figures don’t look barren.

Discussing data visualisation at length is beyond the scope of this blog post, but here are a few remarks. Pay attention to your colour schemes. For plot symbols, there are much nicer and much more informative schemes than the pure-RGB red, green, and blue symbols that some programs use as default; on top, your reader might be colour blind and have a hard time distinguishing between red and green. Always use different symbols AND colours for different curves for maximal clarity. If you want a personalised colour scheme, google for colour scheme generators (you have already learned how to set hexadecimal colour values in your program, right?). For heat maps and similar, pay attention to the neutrality of the colour map you use: make sure that it doesn’t artificially highlight some part of your range of values. In all cases, use colours consistently through your figures. If red and blue are categorical indicators of, say, two different data sets in a graph, do not use a heat map where red and blue indicate high and low values: reserve red and blue for the two data sets, and always use them this way. Likewise, if you use a colour map with a gradient from low to high values, reserve its colours for this purpose alone.

Then, labels and fonts. First, always label your axes. This is self-evident, but I still have to explicitly mention it; even though forgetting to label the axes of a plot should feel roughly like forgetting to get dressed when leaving for work in the morning, it still happens. So, I repeat: label your axes, period. At all stages of your work, even if the plot is just a draft for your eyes. And when labelling, please do make sure that the fonts you use are large enough when the figure is scaled to its intended size; if you have chosen the plot’s dimensions so that no scaling is required, use 10 or 12 pt. Not paying attention to font size is a very common beginner’s problem, and there are even many published paper where a magnifying glass is needed to understand what is going on in the figures. I suspect this has to do with the defaults of the commonly used software packages; default font sizes are almost always tiny. I’ve rarely (if ever) seen plots with annoyingly large fonts, so if in doubt, double your font size.

Figure 1: Do avoid these common problems!

Finally, a few words about “having an eye for design”. While coming up with beautiful and impressive figures seems to come more easily for some, every student can learn to produce good-looking visuals. I’ve many times heard someone say “I cannot draw, and therefore my figures look ugly” but—as with any skill—it just takes time and patience; you do not need to go to art school to learn the essentials. Just like learning how to look at things is the key to learning to draw well, the key to producing great-looking figures is knowing how they should look like, instead of stumbling blindly. This is best learned by imitation. So, next time, take one of your plots that you are not entirely satisfied with, and look up a similar figure in some journal article that you like. Look at the two figures side by side, and try to spot the differences in composition, colours, fonts, line widths, and so on. Then modify your figure and keep on modifying it until you are satisfied with the outcome. Next time, you might not even need a reference figure.