Multipage Adverse Event Reports Using PROC SGPLOT

I presented a paper at PharmaSUG this year along with my coauthor, Mary Beth Herring of Rho, Inc. Multipage Adverse Event Reports Using PROC SGPLOT All readers of Graphically Speaking know that PROC SGPLOT and the rest of ODS Graphics make great graphs. But did you know that you can make a graph extend across multiple pages? The goal here is to display adverse events (AEs) within body systems among subjects in a clinical trial. There are hundreds of AEs, so reports must span multiple pages if they are sent to destinations such as RTF or Printer. Making a multipage graph poses no problem for ODS Graphics---you simply use a BY variable to create page breaks. Most of the work involves deciding where to break pages and properly labeling continuations of body systems.

Each graph is composed of Y-axis tables and scatter plots. Scroll to the bottom to see examples. Body systems and AEs are displayed in an axis table along with AE frequencies and percentages for each of two groups. The frequencies are also displayed in scatter plots, which enable researchers to easily spot trends and differences between the treatments. AEs and body systems might be long, so the code supports split characters, which enable you to split text across two lines. Data preparation is a multistep process. While enterprising programmers could perhaps do all of this in fewer steps, the code is simpler to write if you break the problem down into manageable steps. This example also uses an attribute map to display body systems in a bold font while AEs are displayed in a normal font. One version of the report uses PROC SGPLOT and displays reference lines between body systems. Another version uses PROC SGPLOT to write a template, a DATA step to modify that template to ensure uniform scatter plot width across pages, and PROC SGRENDER to make the plot. This blog also illustrates using nonbreaking spaces to indent lines in the axis table. Most of the details match the PharmaSUG paper. However, here I approach one aspect differently. Sanjay showed me how to use threshold options rather than scaled Y coordinates. Both approaches work, but the threshold options are easier. I changed the code in a few other places too. These changes just make the code a little easier to understand. Much more detail about most other aspects of the coding for this report is provided in the paper.

In the first lines, you set some macro variables. The reports display at most 62 lines in a page. The code will go to a new page rather than starting a new body system at the bottom of a page, so the actual number of lines in a page can vary. The code adds a split character (a tilde) between words when there is a word break after column 20. You could instead add split characters on an ad hoc basis (say, only in specific AEs). When you do not use reference lines between body systems, you can control the width of each graph component by specifying column widths. Here the AE and body system column uses 34% of the space, the group A frequency uses 5%, the group A percentage uses 11%, the group B frequency uses 5%, the group B percentage uses 11%, and the scatter plot uses 34%. You might want different widths depending on how long your AEs are and how aggressively you split them. PROC SGPLOT automatically picks a column width for each page. The PROC SGPLOT results (not shown) look great, but by optionally editing the graph template that it writes, you can explicitly set the column widths and make them consistent across pages.

data adsl; /* Subject-level data used to get ns */
input trtp $ @@;
datalines;
A A B B B A B B A B A A B B B B B A A B A B B A B A B B A A A A A B B A A A
A B A B A B A A A A A A B B B B B A B A B A A B A A A B A A B A B B A B B
;

The second DATA step reads the body systems, the preferred terms for AEs, counts, and percentages. These data are preprocessed for display in a paper or blog. For an actual analysis, you would need to use procedures such as PROC FREQ to compute counts and percentages and a procedure such as PROC SORT to display the AEs by descending maximum AE frequencies within each body system.

The attribute map displays the body systems (designated by Value='Head') using a bold font. The Value variable is also used throughout the data processing to differentiate observation types, although only one type is mentioned in the attribute map. The default font is used when data values do not match attribute map values. 'Head' indicates a header (body system) and later the only line of a one-line header, 'Head1' indicates the first line of a two-line header, 'Head2' indicates the second line of a two-line header, 'Irow' (indented row) indicates an AE and later the only line of a one-line AE, 'Irow1' indicates the first line of a two-line AE, 'Irow2' indicates the second line of a two-line AE, 'Blank' indicates a blank line between body systems, and 'Pad' indicates blank padding on the last page. These values are all arbitrary as long as observations consistently have Value ='Head' in the data set and the attribute map.

This step starts rearranging the data for display. The first axis table displays the variable Rowlab, which contains the body systems, AEs, and blank lines. For each line that is read, one to three lines are written, which populate this and the other variables. Conditionally, a blank line is output before a new body system line, and a body system line is output for a new body system. Unconditionally, each AE is output. Nonbreaking spaces ('A0'x) provide indentation.

This step processes split characters. When a split character is encountered, two lines are output in place of the original one. This step temporarily stores two-line headers, which are used to create continuation headers in the next step.

This step adds a nondisplayed BY variable to make the different pages. The number of lines in each page is not constant to ensure nice header positions. The BY variable is BG, and the number of observations in the BY group is nInGrp. Continuation headers are added when a page breaks in the middle of a body system. Pages never break in the middle of a split line; nor do they break immediately after displaying a new body system. This step also adds blank lines to fill out the last page.

This step removes superfluous blank lines and removes numerals from the Value variable. This step also creates the Y axis variable ObsID, which is the row number. Null ('00'x) labels suppress axis table column headers. Macro variables provide some of the other headers. The variable Ref contains the coordinates for reference lines.

This step writes a graph template by using the TMPLOUT= option in PROC SGPLOT. The Y-axis variable is a row number. The threshold options ensure that PROC SGPLOT does not add extra space along the Y axis to make it extend to normal tick marks labels like 70 (or other integers times a power of 10). Without these options, extra white space might appear at the end of the graph.

This step modifies the graph template to use the specified column weights. It also changes the template name and adds processing for the macro variable x2max. As I mentioned above, this step and the PROC SGRENDER step are optional. They just provide some extra fine tuning beyond what PROC SGPLOT provides.

This creates the report that displays reference lines between each body system. (It does not rely on the preceding PROC SGPLOT or template-editing DATA step.) PROC SGPLOT requires that you specify LOCATION=INSIDE for all of the Y-axis tables if you want reference lines to extend across the entire figure. It also requires you to display color bands in the Y axis. However, you can make them 100% transparent by specifying COLORBANDATTRS=(TRANSPARENCY=1).

While there are many steps, when you break them down, none is complicated. In particular, ODS Graphics can easily make a multipage report by using a BY variable. The only thing that makes this more involved is that care must be taken to split and continue pages and long text strings. See the paper for more details and for more information about axis tables.

About Author

Warren F. Kuhfeld is a distinguished research statistician developer in SAS/STAT R&D. He received his PhD in psychometrics from UNC Chapel Hill in 1985 and joined SAS in 1987. He has used SAS since 1979 and has developed SAS procedures since 1984.
Warren wrote the SAS/STAT documentation chapters "Using the Output Delivery System," "Statistical Graphics Using ODS," "ODS Graphics Template Modification," and "Customizing the Kaplan-Meier Survival Plot." He also wrote the free web books Basic ODS Graphics Examples and Advanced ODS Graphics Examples.