The Complete SAS Format Guide

​Would you like to better understand how to use SAS formats to change the appearance of your data sets and output? Are you interested to learn how to use PROC FORMAT to create your own custom formats? Do you want to use custom formats to quickly and efficiently categorize your data and enhance your analysis? Are you looking for ways to permanently store and manage custom SAS formats?

This article will address all of the above and provide an extensive end-to-end guide on creating, using and managing SAS formats.

Before we continue, make sure you have SAS Studio or SAS 9.4 installed. Don't have the software? Download SAS Studio now. It's free!​

Data Sets

The following datasets from the SASHELP library will be used in this article:

CARS – Data about 2004 cars

ORSALES – Fictitious Sports and Outdoors Store Sales data

PRICEDATA - Simulated monthly sales data

Using Built-in SAS Formats

​SAS provides a vast array of built-in formats to modify the appearance of character, numeric and date values. With any SAS format, it is important to keep in mind that the format is not modifying the actual values in the dataset but only how it is displayed.

Both built-in formats and custom formats follow a specific naming convention. For both built-in and custom formats, character formats always start with a dollar sign ($) while numeric formats do not. With both character and numeric built-in formats, the format name ends in either a “w” (width) or a “w.d” for the width and number of digits that will be shown to the right of a decimal point.

Here are a few sample built-in SAS formats and their naming conventions:

$UPCASEw. – Example: $UPCASE9. is a character format called “UPCASE” with width 9

DATEw. – Example: DATE9. is a numeric format called DATE with width 9

DOLLARw.d – Example: DOLLAR10.1 is a numeric format called DOLLAR with width 10 and 1 decimal point to the right of the decimal place

Next, let’s walk through a few examples of using these formats to understand how SAS formats work.

Character Formats

A character format is a format that can be used with a character variable in a SAS data set. As mentioned, one example of a built-in character format is the $UPCASEw. format. The $UPCASE can format can be used to convert all the letters in a variable to upper case.

For example, to convert the names of the car makes in the SASHELP.CARS dataset to upper case, we can use the FORMAT statement with the $UPCASE format as follows:

data cars_upcase; set sashelp.cars;

format make $upcase.;run;

As you can see from the output dataset shown partially below, all the car makes are now in upper case and the properties of MAKE show that the $UPCASE format has been applied. Note that $UPCASE was automatically adjusted to $UPCASE13 in this case since the length of the original MAKE variable was 13:

​Recall that applying a format does not actually change the values of the variable, so it is easy to reverse or un-apply any formats. To remove the $UPCASE format and revert back to an unformatted MAKE variable, simply use the FORMAT statement as before but remove the $UPCASE portion:

data cars_noformat; set sashelp.cars;

format make;run;

Now in the output dataset, you can see the upper case values are gone and the original appearance of the values has returned:

Numeric Formats

As you might expect, numeric formats are formats which can be used with numeric variables. As mentioned earlier, a common numeric format is the DOLLARw.d format. The DOLLAR format can be used with numeric variables which contain dollar amounts to apply the dollar sign and adjust the number of decimal places shown.

To add the dollar sign to the profit values and only display a single digit after the decimal point, we can apply the DOLLAR format as follows:

data orsales_dollar; set sashelp.orsales;

format profit dollar8.1;run;

As you can see in the output dataset shown partially below, the PROFIT variable has now been formatted to include the dollar sign ($) and only display a single digit after the decimal place:

Date Formats

​While date formats are still a form of numeric formats, they only work with variables that SAS recognizes as dates to begin with. One example of a date format is MMDDYYw. Depending on the width used, the MMDDYYw format can convert dates to look like mm/dd/yy with a width of 8 or look like mm/dd/yyyy with a width of 10 applied.

The SASHELP.PRICEDATA contains the variable DATE, which is formatted as MONYY5 by default:

To modify the DATE variable to appear like mm/dd/yyyy, we can apply the mmddyy10. Format using the following syntax:

data pricedata_date; set sashelp.pricedata;

format date mmddyy10.;run;

As you can see in the PRICEDATA_DATE dataset shown partially below, we now have the date variable formatted as mm/dd/yyyy:

Proc Format

​The FORMAT procedure allows you to create your own custom formats. Using PROC FORMAT you can create both character and numeric formats and also create more complex grouping formats with ranges. Custom formats can be created for an individual SAS session or they can be stored permanently for future use.

While much of what you can accomplish with PROC FORMAT could ultimately be handled with DATA Step programming, PROC FORMAT is a much more efficient solution, particularly with larger datasets as it requires far fewer computational resources.

Creating a Simple Numeric Format

​The SASHELP.CARS dataset contains the numeric variable CYLINDERS which denotes the number of cylinders found in each model of vehicle. In this example, we would like to report on the frequency of the number of cylinders but Instead of displaying the numeric values such as 4,6 or 8, we would like to display the values in words such as “four”, “six” or “eight”.

Before we walk through how to create a numeric format with PROC FORMAT, let’s start by illustrating how this can be achieved with traditional SAS Data Step programming.

In the syntax below, we define a series of IF statements to create a new variable, cylinders_text, which contains the desired description for the number of cylinders in words. We can then verify the results by running PROC FREQ on both the original CYLINDERS variable as well as the newly created CYLINDERS_TEXT variable:

As you can see in the two tables output by the PROC FREQ call shown below, we have successfully created the CYLINDERS_TEXT variable.

Next, let’s look at how we can achieve a similar result using PROC FORMAT.

The PROC FORMAT call starts with a PROC FORMAT statement. By default, PROC FORMAT will store the custom formats in the WORK library and they will only be available during this SAS session. By using the LIBRARY option, you can specify the desired location for the PROC FORMAT catalog, however in this case we will save the catalog to WORK for simplicity.

Next, the VALUE statement is used to name the format and also define the characteristics of the format. In this example, our format is named CYLINDER_FMT and text values for 3,4,5,6,8,10 and 12 are defined by placing the desired words in quotations after the equal sign as shown in the syntax below:

After running the code above, the format is created and will be available in the WORK directory for the remainder of your SAS session. We can now use this format with PROC FREQ to achieve the desired results:

As you can see in output table, the results are consistent with the Data Step example:

While the same results can often be achieved using DATA Step programming or PROC FORMAT, PROC FORMAT can be better both in terms of efficiency and storage utilization:

Efficiency

The DATA Step method requires the use of IF statements to iterate through every observation in the data set whereas PROC FORMAT simply alters the metadata of the dataset.

The DATA Step method also requires you to read in and out a dataset to make the modification, whereas PROC FORMAT does not require writing out any new datasets.

Both using IF statements to iterate through all observations and reading/writing new datasets are potentially time consuming and resource intensive tasks, especially with large datasets.

Storage

The DATA Step methods requires you to create a new variable, whereas PROC FORMAT does not require you to create any new variables.

Creating additional variables could significantly increase the size of your data set, particularly if the formatted values are wide or if you have a large number of observations

Creating a Character Format

The PROC FORMAT syntax to create a custom format for a character variable is very similar to the syntax used for creating a custom numeric variable.

As before, we first start with a PROC FORMAT statement and specify that we would like to save the format in WORK with the library option (recall this is actually the PROC FORMAT default).

Next, we begin defining the format with a VALUE statement followed by the desired format name. There are 2 differences here when compared with creating a numeric format. First, the format name must start with a dollar sign ($) for a character format and second the values to be formatted must also be in quotation marks since they are character values.

In the following syntax, a character format $CAR_TYPE is created which can be applied to the SASHELP.CARS dataset variable TYPE. The $CAR_TYPE format expands the values of TYPE so that they are easier to understand:

Creating a Numeric format with Ranges

PROC FORMAT is also a useful tool for grouping your data to help with certain analyses, categorization, and data interpretation.

In this example, you would like to better understand the distribution of invoice prices for all the vehicle models in the SASHELP.CARS dataset. For this analysis, you’d like to know how many vehicles fall into the falling price categories:

$20,000 or less

$20,001 to $30,000

$30,001 to $50,000

$50,001 or more

Using PROC FORMAT, we can create a custom format called INVOICE_GROUPS to apply to the INVOICE variable in the SASHELP.CARS dataset. As before, we will use the default options of PROC FORMAT and create our new format in the WORK library.

Using the VALUE statement, we will define our new format as INVOICE_GROUPS. When defining the ranges, there are a few important points to consider:

Minimum and maximum values for each range are separated by a dash (-)

“low” and “high” can be used as minimum and maximum values when defining ranges to capture the true minimum and maximum values found within a dataset

Ranges within a format cannot overlap

Keeping the above under consideration, here is the PROC FORMAT syntax to create the INVOICE_GROUPS format:

To illustrate that the newly created format is working correctly, let’s first apply this to a data set and examine the results. Using a PUT statement, we can easily create a new variable, INVOICE_FORMAT and compare the formatted values of INVOICE to the original values of INVOICE. A the end of the Data Step, a keep statement is used to keep only the variables MAKE, MODEL, INVOICE, INVOICE_FORMAT in the output dataset:

data cars_invoice_groups; set sashelp.cars;

invoice_format = put(invoice,invoice_groups.);

keep make model invoice invoice_format;run;

In the output data set shown partially below, you can see that the groupings for the invoice prices are working correctly:

Now that we have verified the newly created format is working as expected, the format can be used to create a report with the grouped invoice prices to gain a better understanding of how many cars fall into each price category.

Using PROC FREQ with a TABLES statement and a FORMAT statement as shown below, we can easily generate this report:

As you can see in the Results shown below, we now have the frequency of cars which fall under each of the newly created invoice groups:

Saving and Retrieving a Permanent Format Catalog

So far, all the formats we have created were saved to the WORK directory, which is the default location that PROC FORMAT saves user created formats. If you would like to store your formats for future use without having to re-run the PROC FORMAT code each time, PROC FORMAT also has the ability to save format catalogs into a permanent SAS library.

To demonstrate how to store a permanent format catalog, let’s create a new simple format for classifying Miles Per Gallon in the City (MPG_CITY) in the SASHELP.CARS dataset.

Before you can save to permanent SAS library, you must first use the LIBNAME statement to define a new permanent library on your system. Note that if you are using SAS Studio you may be able to use the exact same LIBNAME statement shown below, but depending on the SAS version you are using and your system configuration, the path “/folders/myfolders” may need to be replaced with a different path that is available to you on your system.

The syntax for creating the actual format is the same as before, but this time we will use the library option to point to another location on your system:

If the format was successfully created and saved to the library MYLIB, you should now see a formats.sas7bcat file under “My Folders” (or in the location you specified if you didn’t use SAS Studio or the “/folders/myfolders” path):

Now that the format has been created and stored in MYLIB, by default SAS doesn’t know where to find this format and it won’t be available for you to use in this SAS session. To tell SAS where to find the format catalog, you’ll need to add the FMTSEARCH system option. After adding this option and specifying the MYLIB library, you’ll be able to use the newly created MPG_GROUPS format:

After running the code above with the FMTSEARCH option and PROC FREQ, you should now see the new MPG groups reported in the frequency table as shown below:

Viewing Formats in a Catalog

Once you have saved formats in a permanent catalog, you can review them and see the values you have defined. Using PROC FORMAT with the FMTLIB option, you can easily print out a list of all the formats found in that library:

proc format library = mylib fmtlib;run;

After running the code above, you can see the details of the MPG_GROUPS format in the Results shown below:

Importing and Exporting Format Catalogs

As you may have realized, manually typing out many custom format values can be quite time consuming. Fortunately PROC FORMAT also has a utility to create formats based on an existing dataset. This allows you to import tables which may already contain your format definitions and use them to create new custom formats.

Before you can import a SAS data set into PROC FORMAT, it must contain at least the following 4 variables:

FMTNAME – the name of the format you’d like to create

START – the minimum value of the number/character you’d like to format (if you have a character format or if your format will not include a range then this is simply the value you’d like to format)

LABEL – The formatted value you’d like to apply to your data points

TYPE – The type of format you’d like to create (C=Character or N=Numeric are the most common values used here)

(Optionally) END – the maximum value of the number range you’d like to apply a format to

There are other variables that can be found in this dataset as well for more advanced custom formats, but these are the mandatory variables you must have.

Using the following Data Step code, we will create a data set which will be compatible with PROC FORMAT. The format we are creating will be called ENGINE_GROUPS and can be used to group the ENGINESIZE variable from SASHELP.CARS into 3 categories: Small (1-2), Medium (2.1-3.5) and Large (3.6-8.3).

After running the code above, you should now have the MYFORMATS data in WORK as shown below:

Now that we have a dataset with the correct variables and values to define a new format, ENGINE_GROUPS, we can use the CNTLIN option with PROC FORMAT to read in this dataset:

proc format library = work cntlin = myformats;run;

After successfully importing the MYFORMATS dataset with the code above, the ENGINE_GROUPS format is now available for use. For example, we can use the PROC FREQ syntax code below to generate a frequency table on the engine sizes using our new ENGINE_GROUPS format:

After creating the MYFORMATS data set and running the PROC FORMAT and PROC FREQ code above, you should see the following frequency table:

Since all formats are stored in a formats.sas7bcat file, there may be situations where you also want to export your formats into a regular SAS data set (.sas7bdat) file. This process is essentially the reverse of the previous example, however this time we use the CNTLOUT option with PROC FORMAT to export a SAS data set instead of importing a SAS data set.

If you went through this entire article you should now have 3 formats in your WORK library: ENGINE_GROUPS, INVOICE_GROUPS and $CAR_TYPE. To save all these formats into a single dataset, ALL_WORK_FORMATS, we can add the CNTRLOUT option to the PROC FORMAT statement as shown below:

proc format library = work cntlout = work.all_work_formats;run;

After running the code above, you should now see an output dataset, WORK.ALL_WORK_FORMATS, which is shown partially below: