Thursday, August 27, 2009

SAS macros are script files that execute SAS instructions similar to other SAS programs. It is commonly used to automate tasks since the macros can perform repeated tasks with parameters that can be selected by the user. On the iPhone, this can appear as a list screen with selection choices so the macro can be performed on the server from the selected values on the iPhone. The SAS programmer that loads the macro on the server will configure the macros with different type of parameter selection type including the following examples:

On Off SelectorThese are similar to check boxes where it can be set to on (checked) or off (unchecked).

Short Text and PasswordsThe single text entry allows users to enter any text value to the entry. A similar entry type is the password which will mask the entry so that the text is hidden.

Check ListThis include a check list where the user can select one value from a list of valid values.

Text Auto fill This allows users to enter text but provides a list of valid values to choose from if what the user typed matches what is in the valid list of values.

Date TimeThese are preset valid date and time selections. This uses a spin controller with preset for date and time.

Distinct Value Spin ControlThe spinner control can combine values from multiple data sources to then formulate one value. The date and time above illustrates three spinners while this price range illustrates two. You can have it show one to four spinner controls.

In the case of a macro, the parameters of the macro will correspond to a selection by one of the controllers above. The values of the controller are populated by values from a dataset stored on the server.

Macro SelectionThe macros are stored in libraries which correspond to folders on the server. If you have been granted permissions, you would be able to select the library and then view all the macros in the selected library.

Each macro will have a macro short name which is also the name of the file on the server followed by a longer label. The user can select it by selecting anywhere on the row corresponding to the macro. If you wish to select macros from a different library, you can select the library displayed at the top.

Once you select the library, you can then return back to the Macros screen by selecting the "Macros" navigation button at the top.

Executing MacrosOnce you have selected the macro, you will then be presented with all the parameter selection pertaining to the macro. You can scroll down if this extends beyond one screen.

Upon completion of selecting all parameters, you can then click on the "Run" button on the upper right. This will enter all selected values and execute the macro. If thee macro has an error, then the log is displayed.

If the macro runs successfully, then the results will be displayed. Results can be in one of the standard SAS output format including: ASCII (LST), PDF, HTML, Excel (XLS), MSWord (DOC). An example very simple SAS ASCII LST output is shown below.

Data OverviewSASdatasets are stored in libraries on the server similar to how files are stored in folders. Each library is associated with a different folder on the server which contains one or more datasets. You then have the option of selecting the library which contains datasets you wish to view. The screen that lists the available datasets can be accessed through the "Dataset" navigational button at the bottom of your screen.

This will list all the available SASdataset from the specified library. You can then view the contents of the data by selecting on dataset from the list. This is accomplished by tapping anywhere on the row of the displayed dataset name.

Selecting DataThe main dataset screen displays all datasets that you have access to in the specified library. The list displays a short dataset name followed by the more descriptive dataset label below each dataset. You can select a different library by selecting the library name at the top of the dataset view to display the library selection screen as shown here:

Once you select the library, you can then return back to the main Dataset screen by selecting the "Datasets" navigation button at the top. From the list of datasets on the dataset screen, you can view the data by selecting anywhere on the entire row of the dataset name and label. This will then navigate you to the dataset view screen which displays the detail values of the dataset.

The libraries and user access to the libraries are similar to how SAS libraries are created with LIBNAME but managed in the library management tools by an administrator.

Viewing DataSASdatasets can be very large. BI Flash will break down the data into viewable screens of 20 rows at a time. Upon the selection of the dataset from the main dataset screen, you will be brought to the data viewer.

You can navigate to specific observation of the data by selecting the navigational next and back arrow buttons at the bottom of the viewer. The slider also allows you to quickly jump to a specific segment of the data quickly. You can also tap on the current observation number (1-20) to then be presented with a list of the data chunks. This allows you to navigate across larger sets of data.

The list of "chunks" of data can be scrolled through by swiping vertically. This will allow you to scroll through the entire observation list and select on the specific chink of data to be viewed. Upon selection, it will display data viewer with the data with selected observations.

You can also configure how the variables are to be displayed. These options are available through the button shown here:

This button is a toggle which flips the screen between the dataviewer and the configuration screen shown below:

The options include the following.

Formatted - This will apply any SAS formats or user defined format with the associated format catalog upon the view of the data.

Variable Attributes - This will display the variable label, length and related attributes similar to what you would see in a PROC CONTENTS.

Variable Names - The column header will display the variable name as part of the title.

Variable Label - The column header will display the variable label as part of the title.

Data Block - Defines the chunk size of the data which will be viewed to optimize viewing experience for large datasets

The FDA review of an electronic submission requires the merging of submitted data to confirm that the source produces the same aggregate results of the submitted summary analysis. This can only be accomplished if there are clearly defined keys between the datasets and that the keys have standard attributes. A common error that would occur is that the length of the key fields is slightly different. For example, the study identifier (STUDYID) of one set of data is set to length of 7 and another is set to 10. When the two sets are merged, some of the variable values will be truncated leading to errors. An evaluation if key field lengths are crucial in standardizing the key field lengths.

CDISC standards are very helpful in getting the variable attributes such as names and labels standardized. It however, does not enforce the standards of lengths leaving it up to you to evaluate and come up with the correct length for each study submitted. The following steps are recommended to standard your key field lengths to avoid truncation errors.

Step 1Identify all datasets within all your studies being submitted that contains the same variables. An example is that you are submitting three studies and each study has about 15 datasets. In this case, there are common variables such as STUDYID, DOMAIN, USBJID that are in more than one dataset. If any variable that exist in more than one dataset, they should be included in this analysis.

Step 2Determine the longest character length value of each variable across all datasets. So for example, if your verbatim variable AETERM has a text value with the longest length of 45 characters on one dataset and 59 characters on another, you would note the 59. The goal is to evaluate all the data values and determine the longest length across all your data.

Step 3Set the maximum length that will be the standard across all dataset. In the above example, you can set the maximum to be 59 but it may be a better standard to round the length so 60 would be a better standard. In this case, all variables AETERM across all your studies will be set to 60.

An example report shown above illustrates all the variables across three studies. This report is produced by a macro %varlen that automate the evaluation and assignment of the variable lengths. The standardization will prevent errors in your merging of keys but it can also significantly reduce the size of your SAS datasets. Without performing this evaluation, you may just set the variable to be the maximum length of 200 characters. In that case, SAS allocates this and creates very large datasets even though your data values never reach this length. Your standardization effort will result in efficient smaller datasets and allows FDA reviewers using tools such as JMP among other software without causing errors.

Clinical data that has been originally captured from case report forms and then transformed into CDISC SDTM format requires rigorous verification and validation. This will ensure that it meets the guidelines data structure and that the clinical data that has been transformed has not been affected during the transformation. I will recommend a series of steps that uses very basic SAS procedures including PROC PRINT, PROC FREQ and PROC MEANS that will assist you in this validation endeavor.

Step 1 – Print SampleYou can create a PROC PRINT of a subset of just three subjects. This can then be visually reviewed to make sure there are no major changes.

Step 2 – Frequency CountsFor variables that have a small set of distinct values or otherwise known as categorical data, a PROC FREQ is useful for verify if the summary counts between the source and destination matches.Step 3 – Means StatisticsFor numeric variables with lots of values or otherwise known as continuous data, a PROC MEANS can verify if the values are the same.These results were produced from using a SAS macro %verification_rep which generates the reports using SAS PROCs with ODS so that the results are generated in HTML in a frame so that the data can be reviewed side by side. This provides you with a quick visual inspection that can easily identify data differences or discrepancies that are introduced during data transformation to CDISC standards.

One important objective in converting to CDISC standards is to then gain the ability to perform an ISS (integrated safety summary) analysis between multiple studies. Once the data is created in a standard format, it makes it easier to merge the data from a pool of studies since they are in the same CDISC structure. Even though the variables names and labels are standardize, the guideline does not strictly specify the length and other detail attributes. In a recent set of studies I was working on, we ran into discrepancies between two studies even though it was converted to CDISC. The following report illustrates this problem.

Most of the issues came about due to length differences between the two studies. This can lead to truncation if any of these key fields are merged. Other less common issues are things such as labels being different. This can be due to using different versions of the guidelines such as 3.1.1 versus 3.1.2.

I ran a %difftest macro which then revealed some difference among the attributes. This helped us standardized the data even though it was considered standard before because it was in CDISC format. Once we updated to have these attribute truly standardized, it was then useful for the ISS.

Even when things are in CDISC, it does not mean that it is useful for purposes of ISS. It is therefore recommended that consistent attributes are reviewed even on “standard” CDISCdatasets before it can be useful in an integrated safety summary analysis.

Tuesday, August 4, 2009

In the age of Google, information is liberated and readily avialable. However, SAS data is often locked behind servers limited to analysts and statisticians. Facebook is the most popuplar social network and is a common way how users share information. I will show you how you can share SAS dataset on Facebook by doing the following steps.

View Data - You would view your SAS datasets as before with Syview. Select the view which best represent your data.

Write Something - When posting things on Facebook walls, it is recommended that you write a little short message. This describe to your reader what this is about.

View Posting - Log onto Facebook and view the data posted. Note that a thumbnail image of the data view is created along with the message you typed. Users can click on the thumbnail for detail view.

Detail Data View - The user can then click on the thumbnail and they are brought to the detail view of the data. They can log in at this point to see the rest of the data.SAS data is only useful to the users that view them. If it remains behind locked doors, no valuable information can be derived from it. By sharing it with popular social network sites such as Facebook, not only is the data viewable but the comments and communication between users can create a conversation that lead to significant meaning.