SAS Viya

Multi-tenancy is one of the exciting new capabilities of SAS Viya. Because it is so new, there is quite a lot of misinformation going around about it. I would like to offer you five key things to know about multi-tenancy before implementing a project using this new paradigm.

All tenants share one SAS Viya deployment

Just as apartment units exist within a larger, common building, all tenants, including the provider, exist within one, single SAS Viya deployment. Tenants share some SAS Viya resources such as the physical machines, most microservices, and possibly the SAS Infrastructure Data Server. Other SAS Viya resources are duplicated per tenant such as the CAS server and compute launcher. Regardless, the key point here is that because there is one SAS Viya deployment, there is one, and only one, SAS license that applies to all tenants. Adding a new tenant to a multi-tenant deployment could have licensing ramifications depending upon how the CAS server resources are allocated.

Decision to use multi-tenancy must be made at deployment time

Many people, myself included, are not very comfortable with commitment. Making a decision that cannot be changed is something we avoid. Deciding whether your SAS Viya deployment supports multi-tenancy cannot be put off for later.

This decision must be made at the time the software is deployed. There is currently no way to convert a multi-tenant deployment to a single-tenant deployment or vice versa short of redeployment, so choose wisely. As with marriage, the decision to go single-tenant or multi-tenant should not be taken lightly and there are benefits to each configuration that should be considered.

Each tenant is accessed by separate login

Let’s return to our apartment analogy. Just as each apartment owner has a separate key that opens only the apartment unit they lease, SAS Viya requires users to log on (authenticate) to a specific tenant space before allowing them access.

SAS Viya facilitates this by accessing each tenant by way of a separate sub-domain address. As shown in the diagram below, a user wishing to use the Acme tenant must access the deployment with a URL of acme.viya.sas.com while a GELCorp user would use a URL of gelcorp.viya.sas.com.

This helps create total separation of tenant access and allows administrators to define and restrict user access for each tenant. It does, however, mean that each tenant space is authenticated individually and there is no notion of single sign-on between tenants.

No content is visible between tenants

You will notice in both images above that there are brick walls between each of the tenants. This is to illustrate how tenants are completely separated from one another. One tenant cannot see any other tenant’s content, data, users, groups or even that other tenants exist in the system.

One common scenario for multi-tenancy is to keep business units within a single corporation separated. For example, we could set up Sales as a tenant, Finance as a tenant, and Human Resources as a tenant. This works very well if we want to truly segregate the departments' work. But what happens when Sales wants to share a report with Finance or Finance wants to publish a report for the entire company to view?

There are two options for this situation:
• We could export content from one tenant and import it into the other tenant(s). For example, we would export a report from the Sales tenant and import it into the Finance tenant, assuming that data the report needs is available to both. But now we have the report (and data) in two places and if Sales updates the report we must repeat the export/import process.
• We could set up a separate tenant at the company level for shared content. Because identities are not shared between tenants, this would require users to log off the departmental tenant and log on to the corporate tenant to see shared reports.

There are pros and cons to using multi-tenancy for departmental separation and the user experience must be considered.

There are also more levels of administration which requires an administrator persona for the provider of the environment and separate administrator personas for each tenant. Each of these administration personas have varying scope into which aspects of the entire deployment they can interact with. For example, the provider administrator can see all system resources, all system activity, logs and tenants, but cannot see any tenant content.

Tenant administrators can only see and interact with dedicated tenant resources such as their CAS server and can also manage all tenant content. They cannot, however, see system resources, other tenants, or logs.

Therefore, coordinating management of a complete multi-tenant deployment will require multiple administration personas, careful design of operating system group membership to protect and maintain sufficient access to files and processes, and possibly multiple logins to accomplish administrative tasks.

Now what?

I have pointed out a handful of key concepts that differ between the usual single-tenant deployments and what you can expect with a multi-tenant deployment of SAS Viya. I am obviously just scratching the surface on these topics. Here are a couple of other resources to check out if you want to dig in further.

Creating a map with SAS Visual Analytics begins with the geographic variable. The geographic variable is a special type of data variable where each item has a latitude and longitude value. For maximum flexibility, VA supports three types of geography variables:

Predefined

Custom coordinates

Custom polygons

This is the first in a series of posts that will discuss each type of geography variable and their creation. The predefined geography variable is the easiest and quickest way to begin and will be the focus of this post.

Once you have identified a variable in your dataset matching one of these types, you are ready to begin. For our example map, the dataset 'Crime' and variable 'State name' will be used. Let’s get started.

Creating a predefined geography variable in SAS Visual Analytics

Begin by opening VA and navigate to the Data panel on the left of the application.

Select the desired dataset and locate a variable that matches one of the predefined lookup types discussed above. Click the down arrow to the right of the variable and select ‘Geography’ from the Classification dropdown menu.

The ‘Edit Geography Item’ window will open. Depending upon the type of geography variable selected, some of the options on this dialog will vary. The 'Name' textbox is common for all types and will contain the variable selected from your dataset. Edit this label as needed to make it more user friendly for your intended audience.

The ‘Geography data type’ drop down list is where you select the desired type of geography variable. In this example, we are using the default predefined option.

Locate the 'Name or code context' dropdown list. Select the type of predefined variable that matches the data type of the variable chosen from your data. Once selected, VA scans your data and does an internal lookup on each data item. This process identifies latitude and longitude values for each item of your dataset. Lookup results are shown on the right of the window as a percentage and a thumbnail size map. The thumbnail map displays the the first 100 matches.

If there are any unmatched data items, the first 5 will be displayed. This may provide a better understanding of your data. In this example, it is clear from variable name as to what type should be selected (US State Names). However, in most cases that choice will not be this obvious. The lesson here, know your data!

Unmatched data items indicators

Once you are satisfied with the matched results, click the OK button to continue. You should see a new section in the Data panel labeled ‘Geography’. The name of the variable will be displayed beside a globe icon. This icon represents the geography variable and provides confirmation it was created successfully.

Icon change for geography variable

Now that the geography variable has been created, we are ready to create a map. To do this, simply drag it from the Data panel and drop it on the VA report canvas. The auto-map feature of VA will recognize the geography variable and create a bubble map with an OpenStreetMap background. Congratulations! You have just created your first map in VA.

Bubble map created with predefined geography variable

The concept of a geography variable was introduced in this post as the foundation for creating all maps in VA. Using the predefined geography variable is the quickest way to get started with Geo maps. In situations when the predefined type is not possible, using one of VA's custom geography types becomes necessary. These scenarios will be discussed in future blog posts.

Each day, more than 130 Americans die from opioid overdoses. Combating the opioid epidemic begins with understanding it, and that begins with data. SAS recently partnered with graduate students from Carnegie Mellon University (CMU) 's Heinz College of Information Systems and Public Policy to understand how data mining and machine [...]

When the Nordics team asked for support for providing SAS Viya infrastructure on Azure Cloud platform, I didn't hesitate to agree and started planning the environment.

Environment needs

Colleagues from the Nordics countries informed us their Hackathon currently included fourteen registered teams. Hence, they needed at least fourteen different environments with the latest and greatest SAS Viya Tools like SAS Visual Analytics, SAS VDMML and SAS Text Analytics. In addition, participants wanted to get the chance to use open source technologies with SAS and asked us to install R-Studio and Jupyter. This would allow data scientists develop models in a programming language of choice and provide access to SAS predictive modeling capabilities.

The challenge I faced was how to automate this installation process. We didn't want to repeat an exact installation fourteen times! Also, in case of a failure we needed a way to quickly reinstall a fresh virtual machine in our environment. We wanted to create the virtual machines on the Azure Cloud platform. The goal was to quickly get SAS Viya instances up and running on Azure, with little user interaction. We ended up with a single script expecting one parameter: the name of the instance. Next, I provide an overview of how we accomplished our task.

The setup

As we need to deploy fourteen identical copies of the same SAS Viya software, we decided to make use of the SAS Mirror Manager, which is a utility for synchronizing SAS software repositories. After downloading the mirror repository, we moved the complete file structure to a Web Server hosted on a separate Nordics Hackathon repository virtual machine, but within a similar private network where the SAS Viya instances will run. This guarantees low latency when downloading the software.

Next, we create within the base image the SAS Viya Playbook as described in the SAS Viya Deployment Guide. That allows us to kick off a SAS Viya installation later. The Viya installation must occur later during the initial launch of a new VM based on that image. We cannot install SAS Viya beforehand because one of the requirements is a static IP address and a static hostname, which is different for each VM we launch. However, we can install R-Studio server on the base image. Another important file we make available on this base image is a script to initiate the Ansible installations of OpenLdap, SAS Viya and Jupyter.

Deployment

After the common components are in place we follow the instructions from Azure on how to create a custom image of an Azure VM. This capability is available on other public cloud providers as well. Now all the prerequisites to create working Viya environments for the Hackathon are complete. Finally, we create a launch script to install a full SAS Viya environment with single command and one parameter, the hostname, from the Azure CLI.

The script

Testing if the Nordics Hackathon Repository VM is running because we must download software from our own locally created repository.

Launch a new VM, based on the SAS Viya Image we created during preparation, assign a public static IP address, and choose a Standard_E32-16s_v3 Azure VM.

Launch our own Viya-install script to perform the following three sub-steps:

Install openLDAP as the identity provider

Install SAS Viya just as you would do by following the SAS Viya Deployment Guide.

Install Jupyter with a customized Ansible script made by my colleague Alexander Koller.

The result of this is we have fourteen full SAS Viya installations ready in about one hour and 45 minutes. We recently posted a Linkedin video describing the entire process.

Final thoughts

I am planning to write a blog on SAS Communities to share more technical insight on how we created the script. I am honored I was asked to be part of the jury for the Hackathon. I am looking forward to the analytical insights that the different teams will discover and how they will make use of SAS Viya running on the Azure Cloud platform.

In my blog series regarding SAS REST APIs (article 1, article 2, article 3) I outlined how to integrate SAS analytical capabilities into applications. I detailed how to construct REST calls, build body parameters and interpret the responses. I've not yet covered authentication for the operations, purposefully putting the cart before the horse. If you're not authenticated, you can't do much, so this post will help to get the horse and cart in the right order.

Consider this post a quick guide to summarize these resources and shed light on authenticating via authorization code and passwords.

What OAuth grant type should I use?

Choosing the grant method to get an access token with OAuth depends entirely on your application. You can get more information on which grant type to choose here. This post covers two grant methods: authorization code and password. Authorization code grants are generally used with web applications and considered the safest choice. Password grants are most often used by mobile apps and applied in more trusted environments.

The process, briefly

Getting an external application connected to the SAS Viya platform requires the following steps:

Use the SAS Viya configuration server's Consul token to obtain an ID Token to register a new Client ID

Use the ID Token to register the new client ID and secret

Obtain the authorization code

Acquire the access OAuth token of the Client ID using the authorization code

Call the SAS Viya API using the access token for the authentication.

Registering the client (steps 1 and 2) is a one-time process. You will need a new authorization code (step 3) if the access token is revoked. The access and refresh tokens (step 4) are created once and only need to be refreshed if/when the token expires. Once you have the access token, you can call any API (step 5) if your access token is valid.

Get an access token using an authorization code

Step 1: Get the SAS Viya Consul token to register a new client

The first step to register the client is to get the consul token from the SAS server. As a SAS administrator (sudo user), access the consul token using the following command:

The returned token can be lengthy. To assist in later use, create an environment variable from the returned token:

$ exportIDTOKEN="eyJhbGciOiJSUzI1NiIsIm..."

Step 2: Register the new client

Change the client_id, client_secret, and scopes in the code below. Scopes should always include "openid" along with any other groups this client needs to get in the access tokens. You can specify "*" but then the user gets prompted for all their groups, which is tedious. The example below just uses one group named "group1".

We use the returned token to authenticate and authorize the calls made between the client and SAS. We also get a refresh token we use to issue a new token when the current one expires. This way we can avoid repeating all the previous steps. I explain the refresh process further down.

We will again create environment variables for the tokens.

$ exportACCESS_TOKEN="eyJhbGciOiJSUzI1NiIsImtpZCI6ImxlZ..."

$ exportREFRESH_TOKEN="eyJhbGciOiJSUzI1NiIsImtpZC..."

Step 5: Use the access token to call SAS Viya APIs

The prep work is complete. We can now send requests to SAS Viya and get some work done. Below is an example REST call that returns user preferences.

Note the access token is new, and the refresh token remains static. Use the new token for future REST calls. Make sure to replace the ACCESS_TOKEN variable with the new token. Also, the access token has a default life of ten hours before it expires. Most applications deal with expiring and refreshing tokens programmatically. If you wish to change the default expiry of an access token in SAS, make a configuration change in the JWT properties in SAS.

Get an access token using a password

The steps to obtain an access token with a password are the same as with the authorization code. I highlight the differences below, without repeating all the steps.
The process for accessing the ID Token and using it to get an access token for registering the client is the same as described earlier. The first difference when using password authentication is when registering the client. In the code below, notice the key authorized_grant_types has a value of password, not authorization code.

From here, sending requests and refreshing the token steps are identical to the method explained in the authorization code example.

Final thoughts

At first, OAuth seems a little intimidating; however, after registering the client and creating the access and refresh tokens, the application will handle all authentication components . This process runs smoothly if you plan and make decisions up front. I hope this guide clears up any question you may have on securing your application with SAS. Please leave questions or comments below.

What do chocolate and toffee have to do with optimization? Read on and find out.

The application

When deciding on an example to use in this article, I wanted to focus on the interaction between the application and SAS, not app complexity. I decided to use an application created by my colleague, Deva Kumar. His OptModel1 is an application built on the restAF framework and demonstrates how SAS REST APIs can be used to build applications that exploit various SAS Viya functionalities. This application optimizes the quantities of chocolate and toffee to purchase based on a budget entered by the user.

Think of the application as comparable to the guns and butter economic model. The idea in the model is the more you spend on the military (guns), the less you spend on domestic programs and the civilian goods (butter). As President Johnson stated in 1968, "That bitch of a war, killed the lady I really loved -- the Great Society." In this article, I'll stick to chocolate and toffee, a much less debatable (and tastier) subject matter.

The OptModel1 application uses the runOptmodel CAS action to solve the optimization problem. The application launches and authenticates the user, the app requests a budget. Based on the amount entered, a purchase recommendation returns for chocolate and toffee. The user may also request a report based on returned values. In the application, OptModel1 and SAS interact through REST API calls. Refer to the diagram below for application code workflow.

Create the application

To create the application yourself, access the source code and install instructions on SAS' github page. I recommend cloning, or in the least, accessing the repository. I refer to code snippets from multiple files throughout the article.

Application Workflow

Represented below is the OptModel1 work flow. Highlighted in yellow is each API call.

OptModel1 Work Flow

Outlined in the following sections is each step in the work flow, with corresponding numbers from the diagram.

Launch the application

Enter url http://localhost:5006/optmodel in a browser, to access the login screen.

OptModel1 app login page

1. Login

Enter proper credentials and click the 'Sign In' button. The OptModel1 application initiates authentication in the logon.html file with this code:

Application landing page

Notice how the host and access token are part of the resulting url. For now, this is as far as I'll go on authentication. I will cover this topic in depth in a future article.

As I stated earlier, this is the simplest of applications. I want to keep the focus on what is going on under the covers and not on a flashy application.

2a. Application initialization

Once the app confirms authentication, the application initialization steps ensue. The app needs to be available to multiple users at once, so each session gets their own copy of the template Visual Analytics (VA) report. This avoids users stepping on each other’s changes. This is accomplished through a series of API calls as explained below. The code for these calls is in vaSetup.js and reportViewer.js.

2b. Copy data

The app copies data from the Public caslib to a temporary worklib – a worklib is a standard caslib like casuser. The casl code below is submitted to CAS server for execution. The code to make the API call to CAS is in vaSetup.js. The relevant snippet of javascript code is:

3. Enter budget

Enter budget in the space provided (I use $10,000 in this example) and click the Optimize button. This action instructs the application calculate the amount of chocolate and toffee to purchase based on the model.

Enter budget and optimize

4. & 5. Generate and execute CASL code

The code to load the CAS action set, run the CAS action, and store the results in a table, is in the genCode.js file:

Note: The drop table step at the end of the preceding code is important to force VA to reload the data for the report.

6. Get the results - table form

The results return to the application in table form. We now know to buy quantities of 370 chocolate and 111 toffee with our $10,000 budget. Please refer to the casTableViewer for code details of this step.

Data view in table format

6. Get the results - report form

Select the View Graph button. This action instructs OptModel1 to display the interactive report with the new data (the report we created in step 2f). Please refer to the onReport function in index.html for code details of this step.

Data view in report format

Now that we know how much chocolate and toffee to buy, we can make enough treats for all of the holiday parties just around the corner. More importantly, we see how to integrate SAS REST APIs into our application. This completes the series on using SAS REST APIs. The conversation is not over however. I will continue to search out and report on other topics related to SAS, open source languages, and agile technologies. Happy Holidays!

The REST architecture that SAS Viya is built on is, by its nature, open. This is a very powerful thing! In addition, the supplied command-line interfaces (CLIs) add a user-friendly interface to make it easier to make REST calls. Occasionally, however, it is necessary to call REST directly. This can occur when there is (currently) no CLI interface to a piece of functionality, or you wish to run a more complex task from a single command. In the SAS Global Enablement and Learning (GEL) group, as we staged our software images and developed our materials for our SAS Viya training, we found ourselves with some of these needs. As a result, we developed the GEL pyviyatools.

The GEL pyviyatools are a set of Python-based command-line tools that call the SAS Viya REST APIs. The tools can be used to make direct calls to any REST-endpoint (like a cURL command), and as a framework to build additional tools that make multiple rest calls to provide more complex functionality. The tools are designed to be used in conjunction with the sas-admin command line interfaces (CLI).

One of the challenges of making REST calls to SAS Viya is getting your authentication token. The tools simplify this issue by using the authentication mechanism provided by the SAS Viya command-line interfaces.

callrestapi (call_rest_api) is a general tool, and the building block for all the other tools. It calls a function callrestapi() that can also be used from any python program to build more complex tools.

The tools are self-documenting just like the Viya CLIs (just use the -h or –help option)

With callrestapi, you must pass a method and endpoint. You can optionally pass JSON data for a post request, content type headers, and the -o option to change the style of output.

In addition to this basic cURL-like functionality, there are some tools built on top of callrestapi that perform more complex functions. Here are few examples -- check out the GitHub project for a full list.

createdomain.py creates a SAS Viya authentication domain

updatedomain.py loads a set of userids and passwords to a Viya domain from a csv file

updatedomain.py loads a set of userids and passwords to a SASViya domain from a csv file

createfolders.py creates a set of SAS Viya folders from a csv file

explainaccess.py explains access for a folder, object or service endpoint

You can get the tools from GitHub where the installation and usage instructions are documented

Please try these tools if you need more command-line functions in your SAS Viya environment. In addition, if you want to contribute additional tools built on the framework, please see the CONTRIBUTING.md file in the GitHub repository. You can also report any issues or suggestions via GitHub issues.

Disclaimer: this article does not cover or promote any political views. It’s all about data and REST APIs.

I am relieved, thankful, elated, glad, thrilled, joyful (I could go on with more synonyms from my thesaurus.com search for 'happy') November 6, 2018 has come and gone. Election day is over. This means no more political ads on TV, and those signs lining the streets will be coming down! It is a joy to now watch commercials about things that matter. Things like injury lawyers who are on your side or discovering a copper colored pan is going to cook my food better than a black one.

The data

In the closing days of the election season, while being inundated with political advertising, I thought about how much money is spent during each cycle. The exact numbers vary depending on the resource, but the range for this year’s mid-term elections is between four and five billion dollars.

A little research reveals that outside the candidates themselves, the biggest spenders on political ads are political action committees, aka PACs. The Center for Responsive Politics compiled the data set used in this article, and derives from a larger data set released by the Federal Election Commission. The data set lists a breakdown of PAC contributions to campaign finances.

CAS REST APIs

As I explained in the previous article, SAS publishes two sets of APIs. Which APIs to use depends on the service, the data organization, or the intended use of the data. Please refer to the SAS Viya REST API article for more information on each set of APIs.

CAS REST APIs use CAS actions to perform statistical methods across a variety of SAS products. You can also use the CAS REST APIs to configure and maintain the SAS Viya environment. Here, I focus on the CAS actions. Calling the CAS actions via the REST API allow users to access SAS data and procedures and integrate them into their applications.

The process

How to construct the API call

I start with the API documentation for information on how to construct and use the CAS REST APIs. The REST API can submit actions and return the results. Parameters and result data are in JSON format. To specify your parameters, encapsulate the attributes in a JSON object, then submit a POST method on the action. The URL for your action will include the UUID of your session in the format: /cas/sessions/{uuid}/actions/{action}. Replace {uuid} and action with the appropriate values.

Create a session

The first requirement is to create a session. I use the following cURL command to create the session.

I’ll use the UUID for the session to build the URLs for the remainder of the REST calls.

Build the CAS REST API call body

Now we know the general structure of the CAS REST API call. We can browse the CAS actions by name to determine how to build the body text.

Using the simple.summary action definition, I build a JSON body to access the PAC spending from a CASTable, create a new table grouped by political views, and calculate total spending. The resulting code is below:

The REST call creates a new CASTable, SPENDINGBYAFFILIATION. Refer to the screen shot below.

SAS CASTable created by the simple.summary action

I also have the option of returning the data to create the SPENDINGBYAFFILIATION table in JSON format. To accomplish this, remove the casout{} line from the preceding call. Below is a snippet of the JSON response.

JSON response to the simple.summary REST call

After parsing the JSON response code, it is now ready for utilization by a web application, software program, or script.

Moving on

The Thanksgiving Day holiday is fast approaching here in the United States. I plan to eat a lot of turkey and sweet potato pie, welcome the out-of-town family, and watch football. It will be refreshing to not hear the back-and-forth banter and bickering between candidates during commercial breaks. Oh, but wait, Thanksgiving is the start of the holiday season. This means one thing: promotions on Black Friday deals for items I may not need will start airing and last through year's-end. I guess if it is not one thing filling the advertising air waves, it is another. I'll just keep the remote handy and hope I can find another ball game on.

What’s next?

I understand and appreciate political candidates’ needs to communicate their stance on issues and promote their agendas. This takes money. I don't see the spending trend changing direction in the coming years. I can only hope the use of the funds will promote candidates' qualifications, beliefs, and ideas, and not to bash or belittle their opponents.

My next article will demonstrate how to use both the SAS Viya and the CAS REST APIs under the umbrella of one web application. And I promise, no politics.

Disclaimer: this article does not cover or promote any political views. It’s all about data and REST APIs.

I am relieved, thankful, elated, glad, thrilled, joyful (I could go on with more synonyms from my thesaurus.com search for 'happy') November 6, 2018 has come and gone. Election day is over. This means no more political ads on TV, and those signs lining the streets will be coming down! It is a joy to now watch commercials about things that matter. Things like injury lawyers who are on your side or discovering a copper colored pan is going to cook my food better than a black one.

The data

In the closing days of the election season, while being inundated with political advertising, I thought about how much money is spent during each cycle. The exact numbers vary depending on the resource, but the range for this year’s mid-term elections is between four and five billion dollars.

A little research reveals that outside the candidates themselves, the biggest spenders on political ads are political action committees, aka PACs. The Center for Responsive Politics compiled the data set used in this article, and derives from a larger data set released by the Federal Election Commission. The data set lists a breakdown of PAC contributions to campaign finances.

CAS REST APIs

As I explained in the previous article, SAS publishes two sets of APIs. Which APIs to use depends on the service, the data organization, or the intended use of the data. Please refer to the SAS Viya REST API article for more information on each set of APIs.

CAS REST APIs use CAS actions to perform statistical methods across a variety of SAS products. You can also use the CAS REST APIs to configure and maintain the SAS Viya environment. Here, I focus on the CAS actions. Calling the CAS actions via the REST API allow users to access SAS data and procedures and integrate them into their applications.

The process

How to construct the API call

I start with the API documentation for information on how to construct and use the CAS REST APIs. The REST API can submit actions and return the results. Parameters and result data are in JSON format. To specify your parameters, encapsulate the attributes in a JSON object, then submit a POST method on the action. The URL for your action will include the UUID of your session in the format: /cas/sessions/{uuid}/actions/{action}. Replace {uuid} and action with the appropriate values.

Create a session

The first requirement is to create a session. I use the following cURL command to create the session.

I’ll use the UUID for the session to build the URLs for the remainder of the REST calls.

Build the CAS REST API call body

Now we know the general structure of the CAS REST API call. We can browse the CAS actions by name to determine how to build the body text.

Using the simple.summary action definition, I build a JSON body to access the PAC spending from a CASTable, create a new table grouped by political views, and calculate total spending. The resulting code is below:

The REST call creates a new CASTable, SPENDINGBYAFFILIATION. Refer to the screen shot below.

SAS CASTable created by the simple.summary action

I also have the option of returning the data to create the SPENDINGBYAFFILIATION table in JSON format. To accomplish this, remove the casout{} line from the preceding call. Below is a snippet of the JSON response.

JSON response to the simple.summary REST call

After parsing the JSON response code, it is now ready for utilization by a web application, software program, or script.

Moving on

The Thanksgiving Day holiday is fast approaching here in the United States. I plan to eat a lot of turkey and sweet potato pie, welcome the out-of-town family, and watch football. It will be refreshing to not hear the back-and-forth banter and bickering between candidates during commercial breaks. Oh, but wait, Thanksgiving is the start of the holiday season. This means one thing: promotions on Black Friday deals for items I may not need will start airing and last through year's-end. I guess if it is not one thing filling the advertising air waves, it is another. I'll just keep the remote handy and hope I can find another ball game on.

What’s next?

I understand and appreciate political candidates’ needs to communicate their stance on issues and promote their agendas. This takes money. I don't see the spending trend changing direction in the coming years. I can only hope the use of the funds will promote candidates' qualifications, beliefs, and ideas, and not to bash or belittle their opponents.

My next article will demonstrate how to use both the SAS Viya and the CAS REST APIs under the umbrella of one web application. And I promise, no politics.

Prior to SAS Viya

With the creation of SAS Viya, the ability to run DATA Step code in a distributed manner became a reality. Prior to distributed DATA Step, DATA Step programmers never had to think about achieving repeatable results when SAS7BDAT datasets were the sources to their DATA Step code that contains a BY statement. This is because prior to SAS Cloud Analytics Services (CAS), DATA Step ran single-threaded and the source SAS7BDAT dataset was stored on disk. Every time one would run the code we obtained repeatable results because the sequence of rows within the BY group were preserved between runs. To illustrate this, review figures 1, 2, and 3.

In figure 2, we see a BY statement with variable VAR1. This will ensure VAR1 is in ascending order. We are also using FIRST. processing to identify the first occurrence of the BY group. Because this data is stored on disk and because the DATA Step is executed using a single thread, the result table will be repeatable no matter how many times we run the DATA Step code.

Figure 2. Focus on the IF statement, especially VAR2

In figure 3, we see the output SAS7BDAT dataset WORK.TEST2.

_n_

VAR1

VAR2

1

1

N

Figure 3. WORK.TEST2 result dataset from running the code in Figure 2

In figure 4, we are running the same DATA Step but this time our source and target tables are CAS tables. The source table CASLIB.TEST1 was created by lifting the original SAS7BDAT dataset WORK.TEST1 (review figure 1) into CAS.

Figure 4. DATA Step executing in CAS

In figure 5, we see that the DATA Step logic is being respected in runs 1, 2 and 3; but we are not achieving repeatable results. This is due to CAS running on multiple threads. Note that the BY statement – which will group the data correctly for each BY group – is done on the fly. Also, the BY statement will not preserve the sequence of rows within the BY group between runs.

For some processes, this is not a concern but for others it could be. If you need to obtain repeatable results in DATA Step code that runs distributed in CAS as well as match your SAS 9 single-threaded DATA Step results, I suggest the following workaround be used.

Figure 5. DATA Step logic is respected but yields different results with each run

With SAS Viya

The workaround is very simplistic to understand and implement. For each SAS7BDAT dataset being lifted into a CAS table, see figure 6, we need to add a new variable ROW_ID.

_n_

VAR1

VAR2

1

1

N

2

1

Y

3

1

Y

4

2

Y

5

2

Y

6

2

N

Figure 6. Original SAS7BDAT dataset source WORK.TEST1

To accomplish this, we will leverage the automatic variable _N_ that is available to all DATA Step programmers. _N_ is initially set to 1. Each time the DATA step loops past the DATA statement, the variable _N_ increments by 1. The value of _N_ represents the number of times the DATA step has iterated. In our case, the value for each row is the row sequence in the original SAS7BDAT dataset. Figure 7 contains the SAS code we ran on the SAS 9.4M5 workspace server or the SAS Viya compute server to add the new variable ROW_ID.

Figure 7. Creating the new variable ROW_ID

By reviewing figure 8 we can see the new variable ROW_ID in the SAS7BDAT dataset WORK.TEST1. Now that we have the new variable, we are ready to lift this dataset into CAS.

_N_

VAR1

VAR2

ROW_ID

1

1

N

1

2

1

Y

2

3

1

Y

3

4

2

Y

4

5

2

Y

5

6

2

N

6

Figure 8. WORK.TEST1 with the new variable ROW_ID

There are many ways to lift a SAS7BDAT dataset into CAS. One way is to use a DATA Step like we did in figure 9.

Figure 9. DATA Step code to create distributed CAS table CASLIB.TEST1

To obtain the repeatable results, we need to control the sequence of rows within each BY group. We accomplish this by adding the new variable ROW_ID as the last variable to the BY statement in our DATA Step code, see figure 10.

Figure 10. Add ROW_ID as last variable of the BY group

Figure 11 shows us the output CAS table created by the code in figure 10. By adding the new variable ROW_ID and using that variable as the last variable of the BY statement, we are controlling the sequencing of rows within the BY groups for all 3 runs.

VAR1

VAR2

ROW_ID

1

N

1

Figure 11. Distrusted CAS table CASLIB.TEST2

Conclusion

With distributed DATA Step comes great opportunities to improve runtimes. It also means we need to understand differences between single-threaded processing of SAS7BDAT datasets that are stored on disk and distributed processing of CAS tables store in-memory. To help you with that journey I suggest you read the SAS Global Forum paper, Parallel Programming with the DATA Step: Next Steps.