Jocelyn Paine

Dr. Dobb's Bloggers

gnuplot enthusiast Robert Billon has a nice collection of gnuplot screenshots. Here is a Descarte's Folium; here are two intersecting tori. The input for both is as trig functions, but gnuplot can also plot data given as sets of points. In the original version of this posting, I demonstrated by linking to a gnuplot nude, but some readers found that unsuitable, so I've removed the link. It did make a serious point however, because the plot was captioned gnuplot can plot any curve from a suitable datafile. And that is one theme of my posting. Oh yes, and gnuplot can also do penguins.
Not only is gnuplot versatile; it is free, and easy to drive from a program. I am going to show you how to call it from Java, to generate graphs embedded in Web pages. I've been using it in work with the Oxford Pain Research Group, plotting graphs that summarise data from clinical trial spreadsheets about patients' responses to drugs. Our goal was to report on the drugs' efficacy, in units that doctors can easily understand and use when prescribing. As well as telling you about gnuplot, I'll explain about reading data from spreadsheets with Andrew Khan's free Java JExcelAPI library; and about structuring the reports as Web pages, making it easy to link related sections, and to link from summarised quantities back to the original data for the patients being summarised. But first, I have asked my friend Sebastian Straube to explain why one needs these measures of drug efficacy.

gnuplot enthusiast Robert Billon has a nice collection of gnuplot screenshots. Here is a Descarte's Folium; here are two intersecting tori. The input for both is as trig functions, but gnuplot can also plot data given as sets of points. In the original version of this posting, I demonstrated by linking to a gnuplot nude, but some readers found that unsuitable, so I've removed the link. It did make a serious point however, because the plot was captioned gnuplot can plot any curve from a suitable datafile. And that is one theme of my posting. Oh yes, and gnuplot can also do penguins.

Not only is gnuplot versatile; it is free, and easy to drive from a program. I am going to show you how to call it from Java, to generate graphs embedded in Web pages. I've been using it in work with the Oxford Pain Research Group, plotting graphs that summarise data from clinical trial spreadsheets about patients' responses to drugs. Our goal was to report on the drugs' efficacy, in units that doctors can easily understand and use when prescribing. As well as telling you about gnuplot, I'll explain about reading data from spreadsheets with Andrew Khan's free Java JExcelAPI library; and about structuring the reports as Web pages, making it easy to link related sections, and to link from summarised quantities back to the original data for the patients being summarised. But first, I have asked my friend Sebastian Straube to explain why one needs these measures of drug efficacy.

Clinical trials and the measurement of pain intensity

Clinical trials are a way of assessing how well a particular treatment Y works for a particular disease X. Such trials can compare treatment Y to another treatment, or to several others, or to a placebo — a sugar pill with no biological effect. For example, a pain trial may compare different doses of a pain killer to placebo. The most reliable results are produced by clinical trials that are randomised (patients allocated to treatment groups at random) and double blind (neither patient nor physician know what treatment the patient is receiving).

For this blog posting, we have made up data from a fictitious pain trial. This is shown on this Pain Raw Data page. The important thing is the table, whose first three rows we show below:

1

2

3

4

5

6

7

8

Dose

Data errors?

89 001

0.137

0.098

0.197

0.284

0.276

0.179

0.278

0.341

0 mg

89 002

-0.023

-0.175

-0.022

0.346

0.221

0.22

0.232

0.223

0 mg

Row 9 for this ID, row 17, had a week number, 9, that is more than the number of weeks in the trial. I have ignored it.

89 003

0.114

0.187

0.136

0.376

0.231

0.142

0.142

0.241

10 mg

Row 9 for this ID, row 53, had a week number, 9, that is more than the number of weeks in the trial. I have ignored it.

This trial has four treatment groups. That is, we are comparing four dosages of some drug Y: 0 mg, 10 mg, 15 mg, and 20 mg. The 0 mg group is of course the placebo group. Patients would typically receive the drug every day, perhaps in divided doses. The trial has 21 patients, each allocated at random to a treatment group. The left-hand column of the table contains patient IDs; the columns on the right report dose, and whether we detected errors when reading the data from Excel. The body of the table contains weekly measurements of pain relief, which we shall explain after the next paragraph.

Pain relief is an important outcome from trials of pain medicines. One way to measure it is to measure pain intensity (there are several scales for this — and all are by their very nature subjective) at various points of the trial and see how this changes over time. For the sake of argument, let's say that pain intensity is measured daily and weekly averages are calculated from the daily measurements for each patient. Before the trial begins we also measure a baseline pain score so that we have something to judge the changes against.

I should say that in the table shown, the weekly entries for each patient are calculated as:

1 - (pain intensity for the week / baseline pain score)

The subtraction makes this a measure of pain improvement. We displayed the data as improvements rather than as intensities because it made our "responder analysis" calculations, which we shall explain in a later section, easier to check.

Averages as a measure of drug efficacy

Once we have the weekly pain intensity measurements for each patient, we can calculate a weekly average for each treatment group. And by comparing these average pain intensity changes over time across treatment groups, we can compare the different treatments. The results can be shown as graphs. So a report on our example trial would have a graph plotting the 0 mg group's average pain intensity against time, one plotting the 10 mg group's average against time, and so on. This is a common way to compare treatments.

Averages of data from a lot of patients can be really helpful, a great way of summarising data. There are limits to how useful such averages can be, however, because not everyone is average. Let's say, in our example of painful disease X treated by drug Y the pain gets somewhat better, on average. The problem with averages is that they don't always reflect individual experience. Suppose some people with our disease X respond very well to drug Y and others respond not very well at all. The overall average doesn't represent either. Such a pattern of responses actually happens quite frequently, especially with painful conditions.

Responder analysis

If averages are not the way forward, what is? One possible solution is a "responder analysis", that is to ask how many people with disease X achieve a clinically meaningful response with drug Y, say having their pain halved. We could even have several levels of response, such as 30% reduction in pain (already a meaningful response), 50% reduction in pain (a very good response), and 70% reduction in pain (an even better response). Such response levels need to be calculated from individual patient data. We can take a patient's initial baseline pain score as 100% and then determine the percentage change from baseline for different time points in the trial, for example weekly. We can then classify the patients by their response level, e.g. into the categories mentioned above.

We can even take things a step further, and ask how many people achieve our desired level of pain relief without experiencing adverse events such as nausea that make them discontinue the treatment.

In this table, each cell corresponds to a particular response level at a particular week. (Because our example is fictitious, and it takes a long time to make fictitious data realistic, we show the contents of only one such cell, the first.) The cell has one line for each treatment group. Consider the first line. This states the dose — 0 mg — and a pair of numbers 7/21. The second number is the total number of patients in the group, i.e. the total number given 0 mg of drug Y. The first number is the number who experienced any pain relief. (The final number on each line, the NNT, is something that we shall look at in the next section.)

As another example, had we filled in the cell below this one, it would contain similar lines. Suppose the first such line contained the numbers 5/21. This would mean that 5 out of 21 patients receiving 0 mg had experienced at least 15% pain relief. Perhaps only 4 out of 21 would experience at least 30% relief. And so on.

We get a lot of information with this individual patient analysis that can be really useful for doctors and patients. The problem then arises of how to best represent all this information. A good way of doing so is in the form of graphs, such as the one shown here.

(Although this is fictitious data, the curves progress in the same way as in the graphs plotted during our research.) The proportion of patients achieving a desired level of pain response is shown on the y-axis, time in trials is represented on the x-axis.

This graph provides all sorts of valuable information. Let me give you an example. About 20% of patients achieve 50% pain relief or more, a clinically very good level of pain response. We also see that after four to six weeks, the proportion of patients who achieve 50% pain relief does not rise further. This sort of information has immediate relevance to clinical practice: only a minority of patients with disease X achieve a very good pain response with drug Y, and we could hypothesize that those who don't achieve it by four to six weeks are unlikely to achieve it later. So, if a patient with disease X does not get adequate pain relief after 4-6 weeks with drug Y, it might be time to try another treatment.

You could look at this as a therapeutic failure, but it isn't the doctor's or the patient's fault. It is in the nature of disease X and how it responds to treatment with drug Y. Overall population averages don't give that sort of information, and graphs are a really good way of illustrating these relationships. If you have a lot of such data to look at, a tool that automatically generates such graphs and allows you an overview is rather useful. This is one of many examples where pain researchers and computer scientists can cooperate, and as with so much interdisciplinary work, the result is greater than the sum of its parts. Pain Research at the Nuffield Department of Anaesthetics (Oxford) has been pioneering this approach.

Numbers needed to treat

This brings us to the "NNT" number shown on all lines except the 0 mg ones. In addition to calculating the response levels we can compare the proportion of patients achieving a given response level with active treatment to the proportion of patients achieving that same response level with placebo. One way of performing this comparison is to calculate so-called "numbers needed to treat" (NNTs).

An NNT of 10 means that 10 patients need to be treated with a particular therapy for one to get better. The ideal drug has an NNT of 1; everyone treated gets better. This hardly ever happens in medical practice, though some scenarios (treatment of certain infections with antibiotics) can get close. Otherwise an NNT of, say, 10 actually is not bad for a lot of diseases and some commonly used medicines have higher NNTs for their desired effect.

There are more ways of applying the individual patient approach. One area I am currently looking at is the effect of painful conditions and pain therapy on outcomes relevant to occupational medicine: time off work because of pain, its effect on productivity and so on. The individual responder approach holds great promise in this area, too. But perhaps it's time I stop and hand back over to Jocelyn.

Graphs, gnuplot, and Java

Thanks, Sebastian. Let me now explain how I plotted graphs such as the one above. I'll start with some background on gnuplot.

About gnuplot

gnuplot is a free plotting program, downloadable from http://www.gnuplot.info/. The home page states that versions are available for Unix, OS/2, Windows, DOS, Macs, VMS, and Atari, amongst other platforms. Mention of Atari made me think that gnuplot must be old; In fact, says Wikipedia, it dates back to 1986. In that year, says author Thomas Williams in the FAQ:

I was taking a differential equation class and Colin was taking Electromagnetics, we both thought it'd be helpful to visualize the mathematics behind them. We were both working as sys admin for an EE VLSI lab, so we had the graphics terminals and the time to do some coding. The posting was better received than we expected, and prompted us to add some, albeit lame, support for file data.

Thomas Williams and Colin Kelley are two of the authors: the FAQ also acknowledges Russell Lang, Dave Kotz, John Campbell, Gershon Elber, and Alexander Woo.

"gnuplot", says the FAQ, should be spelled with a lower-case "g". Even at the start of a sentence. gnuplot is nothing to do with GNU or the Free Software Foundation. It is also not covered by the General Public License, but it is free to use, though you are not allowed to give away modified versions.

So gnuplot is free. Is it reliable? The FAQ recycles a gratifyingly honest disclaimer from the README of a maths package by R. Freund:

For all intent and purpose, any description of what the codes are doing should be construed as being a note of what we thought the codes did on our machine on a particular Tuesday of last year. If you're really lucky, they might do the same for you someday. Then again, do you really feel *that* lucky?

I commend this disclaimer to Microsoft.

Although most users will run gnuplot interactively, it can easily be driven from a program, taking data points or expressions to be plotted from text .dat files, and plot commands — which do things like setting axis titles, legend styles, and line or point colours — from text .plt files. Your program can write these files, then invoke gnuplot to read them and save the graph it plots to a location specified in the .plt file.

A Java example

Let's now get down to details. This section will show three classes. Graph, which represents a graph; GraphToGnuplot, which is a module exporting the plotting routine; and Test, which demonstrates calling this to draw the "Drug trial A" graph shown earlier. You can download my example in this zip file.

The Graph class

Let's start with class Graph. An instance of Graph represents a graph of one or more clinical measurements against week number. In the version I'm posting here, I am saving paper by not defining access methods for the instance variables. GraphToGnuplot will therefore manipulate these directly, except for method Graph.addPoint.

package pain.dobbs;

public class Graph { public int no_of_curves; // The graph displays this // number of curves on the // same axes.

public String[] curve_title; // The i'th curve (counting from 1) // has name curve_title[ i-1 ]. // This name will be displayed // against the curve or in a key.

public int no_of_weeks; // The graph records data // for this number of weeks.

The GraphToGnuplot module

Now let's look at GraphToGnuplot. I'm using this class as a module: it exports one static method, graphToGnuplot(g). This writes the points in g to a gnuplot .dat file, writes out a corresponding .plt file of plotting commands, runs gnuplot on them, and returns the name of the resulting image file, wrapped in a chunk of HTML that refers to it and to its .dat file.

The .dat file reference, incidentally, is in case our user want to plot the data using other programs. We will publish some of the graphs in research papers, and journals are meticulous about the appearance of included figures. So our users might want to tweak this appearance using plotting programs with which they are familiar.

Here, then, is the source for GraphToGnuplot. The real-life version called methods from some of my libraries, for reading and writing files and for replacing strings in templates. I've merged these into the code, so that you can run it without needing any other source.

// As the method above, but takes plotting commands from // the gnuplot PLT file named by plt_template. These commands // may contain strings delimited by dollar signs. We will // replace these by values of g's instance variables: // see outputGraphToPltString. // private static String graphToGnuplot( Graph g, String plt_template ) { String filebase = "dr_dobbs"; // In real-life use, replace this by a method that // generates a different, unique, filename every time // it is called.

// Writes a gnuplot PLT file to 'filename', generating it // from 'template', in which every substring delimited // by dollar signs has been replaced by the value of a // corresponding instance variable in g. Substrings // "$dat_name$" and "$png_name$" will be // replaced by the final two arguments, the names of the // gnuplot DAT file and the PNG image file to be // generated. // // Treats curve names specially. Replaces // "$curve_title1$" by g.curve_title[0], // "$curve_title2$" by g.curve_title[1], // up to the number of curves. However, the // PLT template has a fixed number of // curves written into its 'plot' // command. If the graph has a different number, // the PLT file may therefore not work. // private static void outputGraphToPltFile( String plt_template, String filename, Graph g, String dat_name, String png_name ) { stringToFile( filename, graphToPltString(plt_template,g,dat_name,png_name) ); }

Java, JExcelAPI, and Excel data

The data from the trials came as Excel spreadsheets, so I needed some way of reading it in Java. (If you ask me why I used Java, it's because I already have a lot of Java code for constructing Web pages, written for a project on Web-based market-research questionnaires.) I'd already come across JExcelAPI, and because it is free and the tutorial was easy to follow, I decided to use that.

About JExcelAPI

JExcelAPI was written by Andy Khan, who makes it available free under the GNU Lesser General Public License. It reads data from spreadsheets, writes and updates spreadsheets (including such things as cell colouring and formatting), and can write to any output stream, including disk, HTTP, databases, and sockets. You can download it via the JExcelAPI main page at http://jexcelapi.sourceforge.net/; and as already mentioned, Andy has written a tutorial, here. This has lots of sample code — with this, I was able to try JExcelAPI on our spreadsheets within half-an-hour of downloading it, and persuade myself it would do what we needed. (A useful argument for providing lots of sample code when you distribute software!)

A Java example

I used only a small part of JExcelAPI, the method for reading cells as strings. I'll show below how I did this. You can download the example and a spreadsheet to test it on from this zip file.

A note about classpaths

I often have trouble getting classpaths right, so it might be useful to show you how I handled JExcelAPI's. I downloaded the JExcelAPI jar file into c:\jexcel\jexcelapi\jxl.jar on my machine. For reasons that I can't now remember, I didn't want JExcelAPI on my default classpath — perhaps I was just imitating the tutorial — so I specified its classpath explicitly when compiling and running, as follows:

Excel data, HTML, and links to individuals

I want to finish by saying a bit about how I presented our reports. The point here is that I decided to do so as HTML because it made it possible to link different parts of the report. One example is the .dat file accompanying each graph. Because, as I mentioned earlier, users might want to replot the graphs using other programs than gnuplot, we gave them a link back to each graph's points as a text file.

A second example came up in another study we did, on dental pain relief. In this, we plotted histograms to show the numbers of patients falling within specified percentage ranges of pain relief. When Sebastian was checking my calculations, he needed to see the original data for all the patients in each range. So I set a link from each bin (i.e. bar on the histogram) to a section listing the patient IDs in the bin, and then a link from each ID to the corresponding raw data.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!