SAS programmers

Have you heard? SAS recently announced a new practical programming credential, SAS® Certified Specialist: Base Programming Using SAS® 9.4. This new practical programming credential is different than our previous exams. Now it requires you to take a performance-based exam, in which you access a SAS environment and then write and execute SAS code. Practical, right? At the end, your answers are scored for correctness.

This new SAS Certified Specialist credential will run in parallel with the current SAS Certified Base Programmer credential until June 2019. So make sure to check out the complete exam content guide for a list of objectives that are tested on the exam!

The new certification guide also includes a workbook that provides programming scenarios to help you prepare for the performance-based portion of the exam. Make sure to also review the SAS® 9.4 Base Programming Exam Experience Tutorial to help you prepare!

Here are some of the changes made to the certification guide:

• Removal of INFILE/INPUT statements. These statements are no longer taught in SAS® Programming 1 and SAS® Programming 2 courses or tested in the exam. Thus, these statements have been replaced with the SET statement and the IMPORT procedure.
• Removal of arrays. Arrays are no longer taught in SAS® Programming 1 and SAS® Programming 2 courses or tested in the exam.
• Addition of the TRANSPOSE and EXPORT procedures, as well as macro variables. These additions are tested and are taught in the SAS® Programming 1 and SAS® Programming 2 courses.
• Updating of existing examples and the addition of more annotated examples that are easier to follow and have better quality graphics. These changes help to improve the readability of examples, and there is a closer relationship between the sample questions in the book and the questions that appear on the actual exam.
• Installation of sample data, and the way you work with SAS libraries has been simplified.
• A new companion piece, the Quick Syntax Reference Guide, is also available for download.

More opportunities to test your skills

For even more programming practice check out Ron Cody’s Learning SAS by Example. This is full of practical examples, and exercises with solutions which allow you to test your programming skills for the exam.

Recently, I worked on a cybersecurity project that entailed processing a staggering number of raw text files about web traffic. Millions of rows had to be read and parsed to extract variable values.

The problem was complicated by the varying records composition. Each external raw file was a collection of records of different structures that required different parsing programming logic. Besides, those heterogeneous records could not possibly belong to the same rectangular data tables with fixed sets of columns.

Solving the problem

To solve the problem, I decided to employ a "divide and conquer" strategy: to split the external file into many files, each with a homogeneous structure, then parse them separately to create as many output SAS data sets.

My plan was to use a SAS DATA Step for looping through the rows (records) of the external file, read each row, identify its type, and based on that, write it to a corresponding output file.

But how do you switch between the output files? The idea came from SAS' Chris Hemedinger, who suggested using multiple FILE statements to redirect output to different external files.

Splitting an external raw file into many

As you know, one can use PUT statement in a SAS DATA Step to output a character string or a combination of character strings and variable values into an external file. That external file (a destination) is defined by a

In this code, the first INPUT statement retrieves the value of REC_TYPE. The trailing @ line-hold specifier ensures that an input record is held for the execution of the next INPUT statement within the same iteration of the DATA Step. It may not be used exactly as written, but the point is you need to capture the filed(s) of interest and stay on the same row.

The second INPUT statement reads the whole raw file record into the _infile_ DATA Step automatic variable.

Depending on the value of the REC_TYPE variable assigned in the first INPUT statement, SELECT block toggles the FILE definition between one of the four filerefs, out1, out2, out3, or out4.

Then the PUT statement outputs the _infile_ automatic variable value to the output file defined in the SELECT block.

Splitting a data set into several external files

Similar technique can be used to split a data table into several external raw files. Let’s combine the above two code samples to demonstrate how you can split a data set into several external raw files:

This code will read observations of the SASHELP.CARS data table, and depending on the value of ORIGIN variable, put _all_ will output all the variables (including automatic variables _ERROR_ and _N_) as named values (VARIABLE_NAME=VARIABLE_VALUE pairs) to one of the three external raw files specified by their respective file references (outasi, outeur, or outusa.)

You can modify this code to produce delimited files with full control over which variables and in what order to output. For example, the following code sample produces 3 files with comma-separated values:

You may use different delimiters for your output files. In addition, rather than using mutually exclusive SELECT, you may use different logic for re-directing your output to different external files.

Bonus: How to zip your output files as you create them

For those readers who are patient enough to read to this point, here is another tip. As described in this series of blog posts by Chris Hemedinger, in SAS you can read your external raw files directly from zipped files without unzipping them first, as well as write your output raw files directly into zipped files. You just need to specify that in your filename statement. For example:

UNIX/Linux

filename outusa ZIP '/sas/data/temp/cars_usa.txt.gz' GZIP;

Windows

filename outusa ZIP 'c:\temp\cars.zip' member='cars_usa.txt';

Your turn

What is your experience with creating multiple external raw files? Could you please share with the rest of us?

Several years ago, I wrote a paper about the top-ten questions about the DATA step that SAS Technical Support receives from customers. Those topics are still popular among people who contact us for help. In this blog, I’m sharing some additional questions that we’re asked on a regular basis. Those questions cover SAS dates, arrays, and how to reference local PC files from SAS® Enterprise Guide® and SAS® Studio when those applications connect to a SAS® server in UNIX operating environments.

About SAS® dates

Let’s begin with dates. We regularly hear customers say something similar to this: "I have a date, but I’m not sure how to use it or whether it’s even a SAS date yet." No worries--we can figure it out! A SAS date is a numeric variable whose value represents the number of days between January 1, 1960 and a specific date. For example, assume that you have a variable named X that has a value of 12398, but you’re not sure what that value represents. Is it a SAS date? Or does it represent January 23, 1998?

To determine what the value represents, you first need to run the CONTENTS procedure on the data set and determine whether the variable in question is character or numeric.

For this example, here is the partial output from the PROC CONTENTS step:

If X is a numeric variable, is a format shown in the FORMAT column for that variable? In this case, the answer is no. However, if the variable is numeric and there is no assigned format, this might be a SAS date that needs to be formatted to make sense of the value. If you run a simple DATA step to add any date format to that SAS date value, you will see that 12398 represents the date December 11, 1993.

data a;
mydate=12398;
format mydate worddate.;
run;

If you print the results of this program with the PRINT procedure, the output for data set A is as shown below:

Obs mydate
1 December 11, 1993

Is this a valid date in the context of this data sample? If you’re unsure, look at the other date values to see whether most of them are similarly structured. Most of the time, if a variable is stored as a SAS date, the variable is already assigned a date format, which is shown in the PROC CONTENTS output. If the value 12398 is a numeric variable such that the digits represent the month, day, and year of a given date (for example, January 23, 1998), you can convert it to a SAS date by running the following DATA step:

data a;
x=12398;
y=input(put(x,5.),mmddyy6.);
format y date9.;
run;

The PROC PRINT output from this step shows that the variable Y has a formatted value of 23JAN1998.

Obs x y
1 12398 23JAN1998

The format that you assign to the variable can be any SAS format or custom-date format.

If the original variable is a character variable, you can convert it to a SAS date by using the INPUT function and the MMDDYY6. informat.

data a;
x='12398';
y=input(x,mmddyy6.);
format y date9.;
run;

Using arrays in SAS

Many customers aren’t quite sure that they understand how to use arrays. Arrays are a common construct in many programming languages. Arrays can seem less complex when you remember that they are a temporary grouping of variables. When you perform the same operation on multiple variables, you have less to program if you can refer to a group of variables by a single name. You simply execute a DO loop that processes each variable in turn, and the task is complete!

We often see arrays used for "reshaping data" or transposing a data set from wide-to-long (or long-to-wide). For example, assume that you want to reshape a data set, comprised of three variables and four observations, into a data set that contains twelve variables. Using an array approach makes the programming much easier, as shown below:

In this example:

1. The variables X, Y, and Z are loaded into an array named VARS, which means that they can be referred to as VARS(1) – VARS(3) or by the variable names X, Y, and Z.2. A multidimensional array named ALL is created with twelve variables. The first number in parentheses represents rows, and the second represents columns.3. A DO loop processes each variable in the VARS array.4. The ALL array is populated one observation at a time by the value of I and the value of J as the DO loop increments.

Because the ALL array is populated by each observation as it is read from data set One, the END= option in the SET statement creates the variable LAST as a flag. This variable indicates when the last observation is read, and the IF statement tests variable LAST. If the variable has a value of 1 (which evaluates to "true"), the statement prints the contents of the program data vector to the output data set. Here's the starting data set and the reshaped result:

Managing PC files in client/server environments

When I began working in Technical Support many years ago, the only interface to Base SAS® software was the Display Manager System, which has separate Program Editor, Log, and Output windows. Now, you can run SAS in various ways, and many of our customers use SAS Enterprise Guide and SAS Studio as their interfaces. One of the most frequently asked questions from customers is about how to access local PC files from these applications that access SAS through a UNIX server.

SAS Enterprise Guide offers built-in tasks to upload and download data sets and other files. You can find these tasks on the Tasks->Data menu.

Two of the tasks, Upload Data Files to Server and Download Data Files to PC, allow you to copy SAS data sets directly between your local PC and your SAS libraries. The third task, Copy Files, allow you to copy any file (or group of files) between your local PC and the file system of the SAS session. See this article to learn how to apply a common pattern with this task: export and download any file from SAS Enterprise Guide. (Note: The Copy Files task was added in SAS Enterprise Guide 7.13. For earlier releases, you can follow the steps in this article.)

If you’re using the SAS Studio interface, you can upload and download files between the server and your PC.

Upload File and Download File buttons in SAS Studio

To download a file from the SAS server to your computer:

1. Select the file that you want to download from the folder tree.2. Click the download button and save the file according to the information in your browser dialog box.

To upload one or more files from your local computer:

1. Select the folder to which you want to upload the files and click the upload button.2. In the Upload Files window, click Choose Files to browse for the files that you want to upload.3. Select one or more files from your computer and click Open. The selected files are displayed as well as their size. An error message is displayed when you try to upload files where the total size exceeds 10 MB.4. Click Upload to complete the upload process.

Always go back to the basics

The three topics that are discussed here don't represent new features or challenges. However, these topics generate many calls to Technical Support. It's a reminder that even as SAS continues to add new features and technology, we still need to know how to tackle the basic building blocks of our SAS programs.

When I was growing up, there were two kinds of Sundays: regular Sundays and George Sundays. George was the proprietor of a local Italian restaurant in my hometown and hosted the extended LaRusso clan for Sunday lunch every few weeks. His restaurant, appropriately named George’s, owns some of my favorite childhood memories – and some of my worst.

Every couple of months, my aunts, uncles, a baker’s dozen of cousins, and my immediate family members would take over George’s backroom and see if we could challenge the city’s noise ordinance. George would do nothing to discourage us, appearing every so often to fire balls of uncooked dough at us or ply us with more caffeine-laced sugary drinks, despite instructions to the contrary from our parents.

Invariably, though, an otherwise pleasant afternoon took a turn for the worse as we were leaving the restaurant. That was when my parents, thinking they were doing us a favor, would let us choose one item off George’s famous “candy wall.” You see, George didn’t stock just one or two different kinds of candy, he had dozens. Every different kind of chocolate bar, brand of gum, and flavor of jelly beans beckoned from George’s Candy Wall. For a 6 or 7-year-old kid, it was just too much. All these choices literally paralyzed me. Ten minutes of indecisiveness and several ultimatums later my parents would usher me out of the restaurant, usually empty-handed and crying. Even on the rare occasions when I did settle on something, I spent the rest of the afternoon lamenting my decision, thinking I left behind something that I would have enjoyed more.

When it comes to the multitude of great support and learning resources we offer new users of SAS, I often wonder if it can feel like you’re staring at George’s Candy Wall as well. While support.sas.com remains the holy grail of SAS customer support, there are so many good choices, it can sometimes be hard to know where to start. That’s why we’ve put together a new resource to make things easier for new SAS users: the SAS Starter Kit.

Need help navigating SAS Support Resources? Here’s your guide

The SAS Starter Kit is the perfect place for SAS newbies to start, outlining the five essential steps to help you learn the basics, grow your skills and connect with other users from around the world.

Step 1 invites you to create a SAS profile. A profile provides you access to things like free, on-demand training, software downloads and access to our SAS Communities, where you can ask questions, get answers and connect with SAS experts from nearly every industry and around the world. You can

Step 2 is your SAS Resource Cheat Sheet. SAS Cares is your one stop listing of all the SAS resources you’ll ever need. Add it to your web favorites or print it out and add a little color to your cube. Keep this one close; it provides quick, one-click access to some of SAS’ most helpful resources.

Step 3 is designed to expand your SAS knowledge. This step introduces you to a full menu of free tutorials to binge watch, a number of free e-courses for a deeper dive and a number of other learning resources from e-books to webinars and more.

Step 4 is the perfect resource if you’re completely new to SAS or just trying something new. Our New SAS User Community is a great place to get coding help, share ideas and best practices, or just lurk! Our SAS Communities have more than 200,000 members ready to help get you unstuck or share what they know.

Finally, Step 5 introduces you to product-specific resources to help develop your skills with your specific tools. Here you’ll find the latest product news, code samples, and step-by-step instructional resources to guide you through common tasks using your product of choice.

We have had several requests from customers who want to use SAS® software to automate the download of data from a website when there is no application programming interface (API) to do it. As an example, the Yahoo Finance website recently changed their service to decommission their API, and this generated an interesting challenge for one of our customers. This SAS programmer wanted to download historical stock price data "unattended," without having to click through a web page. While working on this case, we discovered that the Yahoo Finance website requires a cookie-crumb combination to download. To help you automate downloads from websites that do not have an API, this blog post takes you through how we used the DEBUG feature of PROC HTTP to achieve partial automation, and eventually full automation with this case.

Partial automation

We click Historical Data --> Download Data and get a CSV file with historical stock price data for Apple. We could save this CSV file and read it into SAS. But, we want a process that does not require us to click in the browser.

Because we know the HTTP procedure, we right-click Download Data and then select Copy link address as shown from a screen shot using the Google Chrome browser below:

Note: The context menu that contains Copy link address looks different in each browser.

Using this link address, we expect to get a direct download of the data into a CSV file (note that your crumb= will differ from ours):

The log snippet reveals that we did not provide the Yahoo Finance website with a valid cookie. It is important to note that the response header for the URL shows crumb for the authentication method (the line that shows WWW-Authenticate: crumb. A little web research helps us determine that the Yahoo site wants a cookie-crumb combination, so we need to also provide the cookie. But, why did we not need this step when we were using the browser? We used a tool called Fiddler to examine the HTTP traffic and discovered that the cookie was cached when we first clicked in the browser on the Yahoo Finance website:

Luckily, starting in SAS® 9.4M3 (TS1M3), PROC HTTP will set cookies and save them across HTTP steps if the response contains a "set-cookie: <some cookie>" header when it successfully connects to a URL. So, we try this download in two steps. The first step does two things:

PROC HTTP sets the cookie for the Yahoo Finance website.

Adds the DEBUG statement so that we can obtain the crumb value from the log.

The second step uses the cached cookie from Yahoo Finance (indicated in the "CrumbStore" value), and in combination with the full link that includes the appropriate crumb value, downloads the CSV file into our c:\temp directory.

Full automation

This partial automation requires us to visit the website and right-click on the download link to get the URL. There’s nothing streamlined about that, and SAS programmers want full automation!

So, how can we fully automate the process? In this section, we'll share a "recipe" for how to get the crumb value -- a value that changes with each transaction. To get the current crumb, we use the first PROC HTTP statement to "screen scrape" the URL and to cache the cookie value that comes back in the response. In this example, we store the first response in the Output.txt file, which contains all the content from the page:

It is a little overwhelming to examine the web page in its entirety. And the HTML page contains some very long lines, some of them over 200,000 characters long! However, we can still use the SAS DATA step to parse the file and retrieve the text or information that might change on a regular basis, such as the crumb value.

In this DATA step we read chunks of the text data and scan the buffer for the "CrumbStore" keyword. Once found, we're able to apply what we know about the text pattern to extract the crumb value.

We feel so good about finding the crumb, we're going to treat ourselves to a whole cookie. Anybody care for a glass of milk?

Complete Code for Full Automation
The following code brings it all together. We also added a PROC IMPORT step and a bonus highlow plot to visualize the results. We've adjusted the file paths so that the code works just as well on SAS for Windows or Unix/Linux systems.

Disclaimer: As we've seen, Yahoo Finance could change their website at any time, so the URLs in this blog post might not be accurate at a later date. Note that, as of the time of this writing, the above code runs error-free with Base SAS 9.4M5. And it also works in SAS University Edition and SAS OnDemand for Academics!

Data in the cloud makes it easily accessible, and can help businesses run more smoothly. SAS Viya runs its calculations on Cloud Analytics Service (CAS). David Shannon of Amadeus Software spoke at SAS Global Forum 2018 and presented his paper, Come On, Baby, Light my SAS Viya: Programming for CAS. (In addition to being an avid SAS user and partner, David must be an avid Doors fan.) This article summarizes David's overview of how to run SAS programs in SAS Viya and how to use CAS sessions and libraries.

If you're using SAS Viya, you're going to need to know the basics of CAS to be able to perform calculations and use SAS Viya to its potential. SAS 9 programs are compatible with SAS Viya, and will run as-is through the CAS engine.

Using CAS sessions and libraries

Use a CAS statement to kick off a session, then use CAS libraries (caslibs) to store data and resources. To start the session, simply code "cas;" Each CAS session is given its own unique identifier (UUID) that you can use to reconnect to the session.

There are a few significant codes that can help you to master CAS operations. Consider these examples, based on a CAS session that David labeled "speedyanalytics":

What CAS sessions do I have running?

cas _all_ list;

Get the version and license specifics from the CAS server hosting my session:

cas speedyanalytics listabout;

I want to sign out of SAS Studio for now, so I will disconnect from my CAS session, but return to it later…

cas speedyanalytics disconnect;

...later in the same or different SAS Studio session, I want to reconnect to the CAS session I started earlier using the UUID I previous grabbed from the macro variable or SAS log:

cas uuid="&speedyanalytics_uuid";

At the end of my program(s), shutdown all my CAS sessions to release resources on the server:

cas _all_ terminate;

Using CAS libraries

CAS libraries (caslib) are the method to access data that is being stored in memory, as well as the related metadata.

From the library, you can load data into CAS tables in a couple of different ways:

Takes a sample data set, calculate a new measure and stores the output in memory

Proc COPY can bring existing SAS data into a caslib

Proc CASUTIL loads tables into caslibs

The Proc CASUTIL allows you to save your tables (named "classsi" data in David's examples) for future use through the SAVE statement:

proc casutil;
save casdata="classsi" casout="classsi";
run;

And reload like this in a future session, using the LOAD statement:

proc casutil;
load casdata="classsi" casout="classsi";
run;

When accessing your CAS libraries, remember that there are multiple levels of scope that can apply. "Session" refers to data from just the current session, whereas "Global" allows you to reach data from all CAS sessions.

Programming in CAS

Showing how to put CAS into action, David shared this diagram of a typical load/save/share flow:

Existing SAS 9 programs and CAS code can both be run in SAS Viya. The calculations and data memory occurs through CAS, the Cloud Analytics Service. Before beginning, it's important to understand a general overview of CAS, to be able to access CAS libraries and your data. For more about CAS architecture, read this paper from CAS developer Jerry Pendergrass.

The performance case for SAS Viya

To close out his paper, David outlined a small experiment he ran to demonstrate performance advantages that can be seen by using SAS Viya v3.3 over a standard, stand-alone SAS v9.4 environment. The test was basic, but performed reads, writes, and analytics on a 5GB table. The tests revealed about a 50 percent increase in performance between CAS and SAS 9 (see the paper for a detailed table of comparison metrics). SAS Viya is engineered for distributive computing (which works especially well in cloud deployments), so more extensive tests could certainly reveal even further increases in performance in many use cases.

Attention SAS administrators! When running SAS batch jobs on schedule (or manually), they usually produce date-stamped SAS logs which are essential for automated system maintenance and troubleshooting. Similar log files have been created by various SAS infrastructure services (Metadata server, Mid-tier servers, etc.) However, as time goes on, the relevance of such logs diminishes while clutter stockpiles. In some cases, this may even lead to disk space problems.

There are multiple ways to solve this problem, either by deleting older log files or by stashing them away for auditing purposes (zipping and archiving). One solution would be using Unix/Linux or Windows scripts run on schedule. The other is much "SAS-sier."

Let SAS clean up its "mess"

We are going to write a SAS code that you can run manually or on schedule, which for a specified directory (folder) deletes all .log files that are older than 30 days.
First, we need to capture the contents of that directory, then select those file names with extension .log, and finally, subset that file selection to a sub-list where Date Modified is less than Today's Date minus 30 days.

Perhaps the easiest way to get the contents of a directory is by using the X statement (submitting DOS’ DIR command from within SAS with a pipe (>) option, e.g.

However, SAS administrators know that in many organizations, due to cyber-security concerns IT department policies do not allow enabling the X statement by setting SAS XCMD system option to NOXCMD (XCMD system option for Unix). This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration. In this case, no operating system command can be executed from within SAS. Try running any X statement in your environment; if it is disabled you will get the following ERROR in the SAS log:

ERROR: Shell escape is not valid in this SAS session.

To avoid that potential roadblock, we’ll use a different technique of capturing the contents of a directory along with file date stamps.

Macro to delete old log files in a directory/folder

The following SAS macro cleans up a Unix directory or a Windows folder removing old .log files. I must admit that this statement is a little misleading. The macro is much more powerful. Not only it can delete old .log files, it can remove ANY file types specified by their extension.

3. Using explicit parameters

With this macro call, all files with extension .xls (Excel files) which are older than 10 days will be deleted from the specified directory.

Old file deletion SAS macro code explanation

The above SAS macro logic and actions are done within a single data _NULL_ step. First, we calculate the date from which file deletion starts (going back) deldate = today() - &dayskeep. Then we assign fileref indir to the specified directory &dirpath:

rc = filename('indir',"&dirpath");

Then we open that directory:

did = dopen('indir');

and if it opened successfully (did>0) we loop through its members which can be either files or directories:

do i=1 to dnum(did);

In that loop, first we grab the directory member name:

memname = dread(did,i);

and look for our candidates for deletion, i.e., determine if that name (memname) ends with "&ext". In order to do that we reverse both character strings and compare their first characters. If they don’t match (^=: operator) then we are not going to touch that member - the continue statement skips to the end of the loop. If they do match it means that the member name does end with "&ext" and it’s a candidate for deletion. We assign fileref inmem to that member:

rc = filename('inmem',"&dirpath/"!!memname);

Note that forward slash (/) Unix/Linux path separator in the above statement is also a valid path separator in Windows. Windows will convert it to back slash (\) for display purposes, but it interprets forward slash as a valid path separator along with back slash.
Then we open that file using fopen function:

fid = fopen('inmem');

If inmem is a directory, the opening will fail (fid=0) and we will skip the following do-group that is responsible for the file deletion. If it is file and is opened successfully (fid>0) then we go through the deletion do-group where we first grab the file Last Modified date as moddate, close the file, and if moddate <= deldate we delete that file:

rc = fdelete('inmem');

Then we close the directory and un-assign filerefs for the members and directory itself.

Deleting old files across multiple directories/folders

Macro %mr_clean is flexible enough to address various SAS administrators needs. You can use this macro to delete old files of various types across multiple directories/folders. First, let’s create a driver table as follows:

This driver table specifies how many days to keep files of certain extensions in each directory. In this example, perhaps the most beneficial deletion applies to the SAS_Backups folder since it contains SAS data tables (extension .sas7bdat). Data files typically have much larger size than SAS log files, and therefore their deletion frees up much more of the valuable disk space.

Put it on autopilot

When it comes to cleaning your old files (logs, backups, etc.), the best practice for SAS administrators is to schedule your cleaning job to automatically run on a regular basis. Then you can forget about this chore around your "SAS house" as %mr_clean macro will do it quietly for you without the noise and fuss of a Roomba.

Your turn, SAS administrators

Would you use this approach in your SAS environment? Any suggestions for improvement? How do you deal with old log files? Other old files? Please share below.

The Geo Map Visualization has several built-in geographical units, including country and region names and codes, US state names and codes, and US zip codes. You can also define your own geographic units. This paper describes how to identify any geographic point of interest, or collection of points, on a map to create custom maps in SAS.

Once upon a Time

Once upon a time, Oliver S. Füßling merely occupied a line in a SAS® program. But one day, he lost his last name, and a quest began to help our hero find the rest of his name.

Our Story Begins

The SAS Training Center wanted to re-create course data for the "Introduction to Programming 1" class. The updated class uses SAS® Studio, a new programming environment that incorporates a UTF-8 SAS session encoding. However, the course data sets contained national language characters, which are not available on an English keyboard. As a result, depending on how those programs were submitted in the new environment, they experienced the following transcoding problems:

character substitution

data truncation

invalid-data errors

Like the Training Center, you might encounter similar transcoding issues if you have programs that:

contain national language characters

are created in the WLatin-1 SAS session encoding

you move to a UTF-8 SAS session encoding.

This story explains how you can move such programs successfully to a UTF-8 environment and avoid substitution characters, data truncation, and invalid-data errors.

The programs in the "Introduction to Programming 1" class were originally submitted via an earlier English edition of the SAS® Foundation. However, the sample program in this story is created in SAS® 9.4 (English).

When the program is opened in the Enhanced Editor window, this is how a shortened version of the program looks:

Note: If you would like a copy of this program for your own testing, see the Epilogue heading at the end of this post.

In SAS 9.4 (English) for the Windows environment, the default session encoding is WLatin-1. You can see the encoding in the log by running either of the following sets of statements:

PROC OPTIONS OPTION=ENCODING;
RUN;

%PUT ENCODING= %SYSFUNC(getOption(ENCODING));

When programs are saved from the Enhanced Editor window, the encoding for the program file defaults to Default - Western (Windows), as shown below.

When the program file that is shown earlier, which contains the name Oliver S. Füßling, is uploaded and included into the SAS Studio code editor, Oliver's last name displays replacement characters rather than the expected national language characters.

Note: This display shows SAS Studio open in a Google Chrome browser. In this browser, you see two characters (diamonds with white question marks) that are substituted for the national language characters. If you use SAS Studio in Microsoft Internet Explorer, the display shows only one diamond, and it truncates the remainder of the name.

To begin resolving the display problem, you need to look at the code-editor status bar (bottom of the window).

Notice that there is a text-encoding setting that informs SAS Studio of the encoding of the external file. That setting is shown to the right on the status bar. In the display above, that encoding is UTF-8.

Be aware that this text-encoding setting differs from the UTF-8 SAS session encoding that is displayed by the SAS ENCODING system option, which is generated in the log when you run the OPTIONS procedure. In SAS Studio, the default text encoding is UTF-8, regardless of the session encoding. Because the pgm.sas program was saved originally from the Enhanced Editor in the default Western-Windows encoding, it is not in the encoding that the SAS Studio code editor expects.

To fix the display issue, you can use either of the following options:

The Windows code page 1252 represents the character set that is used by Western European languages, including English, in Microsoft Windows operating environments. The WLatin-1 encoding is the SAS equivalent for the 1252 Windows code page.[1]

3. Click OK to save your selection before you exit the dialog box.

Option 2

From the General tab in the Preferences dialog box, select a value for the default text encoding.

When you set the value in this way, the change is not reflected immediately in the existing code- editor window. You must close the program and re-open it for the setting to take effect. Any programs that you open later will retain the same setting unless you change the setting or override it by selecting another value in the Select Text Encoding dialog box.

Oliver's last name is displayed correctly in the code editor when you use the windows-1252 setting to open the file, as shown below:

However, Oliver's last name is truncated on the HTML Results tab when you submit the program.

The Plot Thickens

Although the problem is fixed in the code editor when you submit the program, Oliver's last name is truncated as Füßli in the output. However, you do not receive any notes or warnings in the log about that truncation. So, why does the truncation happen? The ü (U-umlaut) and the ß (German Eszett) are stored as single-byte characters (SBCS) in WLatin-1, but those characters require two bytes in UTF-8. As a result, there is not adequate space to print the remaining characters in the name.

When you submit programs that contain national language characters from a single-byte encoding to a UTF-8 environment, you must be prepared to modify the program to use wider informats when you create your variables. Otherwise, character truncation can occur.

You can correct this problem easily by enlarging the column to accommodate the extra bytes that are used to store the characters in UTF-8.

Here is the modified INPUT statement that successfully reads the data in a UTF-8 SAS session. The character informat for the Lname variable is increased from $7. to $9.

input StudID $12. Age Fname $6. Mi :$2. Lname $9.;

After you increase the informat, Oliver's last name is correct when you view it on the HTML Results tab.

A Subplot Appears

What if the program file is included and executed by using the %INCLUDE statement rather than by submitting it from the code editor?

In this situation, the program stops processing with the following errors:

Happily Ever After (or, The End)!

The moral of this story is that there are many ways to avoid transcoding problems when you have national language characters in SAS programs that you save from a SAS®9 (English) session and move to a UTF-8 environment. Hopefully, you can use the tips that are provided to avoid such issues. However, if you still have problems, you can call on another hero, SAS Technical Support, for help!

Epilogue

The following program is the one used throughout this story. You can copy and paste it for your own use.

Once upon a Time

Once upon a time, Oliver S. Füßling merely occupied a line in a SAS® program. But one day, he lost his last name, and a quest began to help our hero find the rest of his name.

Our Story Begins

The SAS Training Center wanted to re-create course data for the "Introduction to Programming 1" class. The updated class uses SAS® Studio, a new programming environment that incorporates a UTF-8 SAS session encoding. However, the course data sets contained national language characters, which are not available on an English keyboard. As a result, depending on how those programs were submitted in the new environment, they experienced the following transcoding problems:

character substitution

data truncation

invalid-data errors

Like the Training Center, you might encounter similar transcoding issues if you have programs that:

contain national language characters

are created in the WLatin-1 SAS session encoding

you move to a UTF-8 SAS session encoding.

This story explains how you can move such programs successfully to a UTF-8 environment and avoid substitution characters, data truncation, and invalid-data errors.

The programs in the "Introduction to Programming 1" class were originally submitted via an earlier English edition of the SAS® Foundation. However, the sample program in this story is created in SAS® 9.4 (English).

When the program is opened in the Enhanced Editor window, this is how a shortened version of the program looks:

Note: If you would like a copy of this program for your own testing, see the Epilogue heading at the end of this post.

In SAS 9.4 (English) for the Windows environment, the default session encoding is WLatin-1. You can see the encoding in the log by running either of the following sets of statements:

PROC OPTIONS OPTION=ENCODING;
RUN;

%PUT ENCODING= %SYSFUNC(getOption(ENCODING));

When programs are saved from the Enhanced Editor window, the encoding for the program file defaults to Default - Western (Windows), as shown below.

When the program file that is shown earlier, which contains the name Oliver S. Füßling, is uploaded and included into the SAS Studio code editor, Oliver's last name displays replacement characters rather than the expected national language characters.

Note: This display shows SAS Studio open in a Google Chrome browser. In this browser, you see two characters (diamonds with white question marks) that are substituted for the national language characters. If you use SAS Studio in Microsoft Internet Explorer, the display shows only one diamond, and it truncates the remainder of the name.

To begin resolving the display problem, you need to look at the code-editor status bar (bottom of the window).

Notice that there is a text-encoding setting that informs SAS Studio of the encoding of the external file. That setting is shown to the right on the status bar. In the display above, that encoding is UTF-8.

Be aware that this text-encoding setting differs from the UTF-8 SAS session encoding that is displayed by the SAS ENCODING system option, which is generated in the log when you run the OPTIONS procedure. In SAS Studio, the default text encoding is UTF-8, regardless of the session encoding. Because the pgm.sas program was saved originally from the Enhanced Editor in the default Western-Windows encoding, it is not in the encoding that the SAS Studio code editor expects.

To fix the display issue, you can use either of the following options:

The Windows code page 1252 represents the character set that is used by Western European languages, including English, in Microsoft Windows operating environments. The WLatin-1 encoding is the SAS equivalent for the 1252 Windows code page.[1]

3. Click OK to save your selection before you exit the dialog box.

Option 2

From the General tab in the Preferences dialog box, select a value for the default text encoding.

When you set the value in this way, the change is not reflected immediately in the existing code- editor window. You must close the program and re-open it for the setting to take effect. Any programs that you open later will retain the same setting unless you change the setting or override it by selecting another value in the Select Text Encoding dialog box.

Oliver's last name is displayed correctly in the code editor when you use the windows-1252 setting to open the file, as shown below:

However, Oliver's last name is truncated on the HTML Results tab when you submit the program.

The Plot Thickens

Although the problem is fixed in the code editor when you submit the program, Oliver's last name is truncated as Füßli in the output. However, you do not receive any notes or warnings in the log about that truncation. So, why does the truncation happen? The ü (U-umlaut) and the ß (German Eszett) are stored as single-byte characters (SBCS) in WLatin-1, but those characters require two bytes in UTF-8. As a result, there is not adequate space to print the remaining characters in the name.

When you submit programs that contain national language characters from a single-byte encoding to a UTF-8 environment, you must be prepared to modify the program to use wider informats when you create your variables. Otherwise, character truncation can occur.

You can correct this problem easily by enlarging the column to accommodate the extra bytes that are used to store the characters in UTF-8.

Here is the modified INPUT statement that successfully reads the data in a UTF-8 SAS session. The character informat for the Lname variable is increased from $7. to $9.

input StudID $12. Age Fname $6. Mi :$2. Lname $9.;

After you increase the informat, Oliver's last name is correct when you view it on the HTML Results tab.

A Subplot Appears

What if the program file is included and executed by using the %INCLUDE statement rather than by submitting it from the code editor?

In this situation, the program stops processing with the following errors:

Happily Ever After (or, The End)!

The moral of this story is that there are many ways to avoid transcoding problems when you have national language characters in SAS programs that you save from a SAS®9 (English) session and move to a UTF-8 environment. Hopefully, you can use the tips that are provided to avoid such issues. However, if you still have problems, you can call on another hero, SAS Technical Support, for help!

Epilogue

The following program is the one used throughout this story. You can copy and paste it for your own use.