Machine learning. Artificial Intelligence

Menu

How to Buy SAS Visual Analytics

Stories about SAS Visual Analytics are among the most widely read posts on this blog. In the last two years I’ve received many queries from readers who complain that it’s hard to get clear answers about the software from SAS.

In software procurement, the customer has bargaining power until the deal closes; after that, power shifts to the vendor. In this post, I’ve compiled some key questions prospective customers should resolve before signing a license agreement with SAS.

SAS Visual Analytics (VA), first launched in 2012, is now in its seventh dot release. With a total of ~3,400 sites licensed, the most serious early release issues are resolved. The product itself has improved. In early releases, for example, it was impossible to join tables after loading them into VA; now you can. SAS has gradually added features to the product, and will continue to do so.

Privately, SAS account executives describe VA as a “Tableau-Killer”; a more apt description is “Tableau for SAS Lovers.” An experienced Tableau user will immediately notice features missing from VA. On the other hand, SAS offers some statistical features (SAS Visual Statistics) not currently available in Tableau, for an extra license fee.

As this chart shows, Tableau is still alive:

Source: Tableau Annual Report: SAS Revenue Press Release

SAS positions VA to its existing BI customers as a replacement product, and not a moment too soon; Gartner reports that organizations are rapidly pulling the plug on the legacy SAS BI product. SAS prices VA to sell, clearly seeking to underprice Tableau and build a footprint. Ordinarily, SAS pricing is a closely held secret, but SAS discloses its low VA pricing in the latest Gartner BI Magic Quadrant report.

Is VA the Right Solution?

VA works with SAS LASR Server, a proprietary in-memory analytic datastore, which should not be confused with in-memory databases like SAP HANA, Exasol or MemSQL. In-memory databases have many features that are missing from LASR Server, such as ACID compliance, ANSI SQL engines and automated archiving. Most in-memory databases can update data in real time; for LASR Server, you update a table by reloading it. Commercial in-memory databases support many different end-user products for visualization and BI, so you aren’t locked in with a single vendor. LASR Server supports SAS software only.

Like any other in-memory datastore, LASR Server is best for small high-value databases that will be queried by many users who require low latency. LASR Server reads an entire table into memory and persists it there, so the amount of available memory is a limiting factor.

Since LASR Server is a distributed engine you can add more servers if you need more memory. But keep in mind that while the cost of memory is declining, it is not free; it is still quite expensive per byte compared to disk storage. In practice, most working in-memory databases support less than a terabyte of data. By contrast, the smallest data warehouse appliances sold by vendors like IBM support thirty terabytes.

LASR Server’s principal selling point is speed. The product is fast because it persists data in memory, and separates the disk I/O bottleneck from the user experience. (You still need to load data into LASR Server, but you can do this separately, when the user isn’t waiting for a response.)

In contrast, Tableau uses a patented (e.g. proprietary) data engine that interfaces with your data source. For extracts not already cached on the server, Tableau submits a query whose runtime depends on the data source; if the supporting database is poorly tuned, the query may take a long time to run. In most cases, VA will be faster than Tableau, but it’s debatable how critical this is for a decision support application.

VA and LASR Server are the right solution for your business problem if all of the following conditions are true:

You work with less than a terabyte of data

You are willing to limit your visualization and BI tools to SAS software

You expect more than a handful of concurrent users

Your users require subsecond query response times

If you are thinking of using VA and LASR Server in distributed mode (implemented across more than one server), keep in mind that distributed computing is an order of magnitude more difficult to deliver. Since SAS pitches a low-cost “Single Box Solution” as an entry-level product, most of those 3,400 customer sites run on a single server. Before you commit to licensing the product in a multi-server configuration, you should insist on additional proof of product viability from SAS. For example, insist on references from customers running in production in configurations at least as large as what you have in mind; and consider a full proof-of-concept (funded by SAS).

SAS’ low software pricing for VA makes it seem attractive. However, you need to focus on the total cost of ownership, which we discuss below.

Infrastructure Costs

According to SAS’ sizing guidelines for VA, a single 16-CPU server with 256G RAM can support a 20GB table with seven heavy users. (That’s 20 gigabytes of uncompressed data.)

For a rough estimate of the amount of hardware required:

Determine the size of the largest table you plan to load

Determine the total amount of data you plan to load

Determine the planned number of “heavy” and “light users. SAS defines a heavy user as “any SAS Visual Analytics Explorer user or a user who runs correlational analysis with multiple variables, box plots with four or more measures, or crosstabs with four or more class variables.” In practice, this means every user.

In Step #4, you write a large check to your preferred hardware vendor, unless you are working with tiny data.

SAS will tell you that VA runs on commodity servers. That is technically true, but a little misleading. SAS does not require you to buy your servers from any specific vendor; however, the specs needed for good performance are quite different from a typical Hadoop node server. Not surprisingly, VA requires specially configured high-memory machines, such as these from HP.

Node servers are just the beginning of the story. According to an HP engineer with extensive VA experience, networking is a key bottleneck in implementations. Before you sign a license agreement for VA, check with your preferred hardware vendor to determine how much experience they have with the product. Ask them to provide a firm quote for all of the necessary hardware, and a firm schedule for delivery and installation.

Keep in mind that SAS does not actually recommend hardware for any of its software. While SAS will work with you to estimate volume and workload, it passes this information to the hardware vendors you specify for the actual recommended sizing and configuration. Your hardware vendor plays a key role in the success of your implementation of this product, so it’s important that you choose a vendor that has significant experience with this software.

Implementation

SAS publishes most of its documentation on its support website. For VA, however, SAS keeps technical documentation for installation, configuration and administration under lock and key. The implication is that it’s not pretty. Before you sign a license agreement, you should insist that SAS provide the documentation for your team to review.

There is more to implementing this product than software installation. Did you notice the fine print in SAS’ Hardware Sizing Guidelines? I quote:

“These guidelines do not address the data management resources needed outside of SAS Visual Analytics. Getting data into SAS Visual Analytics and performing other ETL functions are solely the responsibility of the user.”

VA’s native capabilities for data cleansing and transformation have improved since the first release, but they are still rudimentary. So unless your source data is perfectly clean and ready to use — ha ha — you’re going to need ETL processes to prepare your data. Unless your prospective users are ETL experts, they will need someone to build those feeds; and unless you have SAS developers sitting on the bench, you’re going to need SAS or a SAS Partner to provide developers who can do the job.

If you are thinking about licensing VA, you are almost certainly using legacy SAS products already. You may think that will make implementation easier, but think again: VA and LASR Server are fundamentally new products with a new architecture. Your SAS users and developers will all need training. Moreover, your existing SAS programs may need conversion to work with the new software.

Before you sign a license agreement for VA, insist on a firm, fixed price quote from SAS for all implementation tasks, including data feeds. Your SAS Account Executive will tell you that SAS “does not do” fixed price quotes. Nonsense. SAS will happily give away consulting services if they can win your software business, so don’t take “no” for an answer.

SAS will need to do an assessment, of course, before fixing the price, which is fine as long as you don’t have to pay for it.

Time to Value

When SAS first released VA, implementations ran around three months under ideal circumstances. Many ran much longer, due to unanticipated issues with networking and infrastructure. With more experience, SAS has a better understanding of the product’s infrastructure requirements, and can set expectations accordingly.

Nevertheless, there is no reason for you to assume the risk of delay getting the product into production. SAS charges you for a license to use the software from the moment you sign the contract; if the implementation project runs long, it’s on your dime.

You should insist on a firm contractual commitment from SAS to get the software up and running by a date certain, with financial penalties for failure to deliver. It’s unlikely that SAS will agree to deferred payment of the first-year fee, or an acceptance deal, since this impacts revenue recognition. But you should be able to negotiate an extended renewal anniversary based on the date of delivery and acceptance. You can also negotiate deferred payment of the fixed price consulting fee.

10 comments

Interesting talk today by SAS lead data scientist to our company:
a) They are still offering two in memory big data analytic platforms
b) They are working on out-of-memory big data analytics, similar to Spark-spill-over-into-disk-when-no-more-RAM capability

Yes, SAS still offers two different architectures (HPA and LASR). SAS rarely kills a product once built. However, SAS insiders see HPA as a white elephant, and most Sales and Marketing effort goes into LASR. There are still no public success stories for HPA, and only a handful of users.

Concerning (b), plain old Legacy SAS does out-of-memory to disk, so it’s not clear what SAS is working on. If you mean they are adding out-of-memory capability to LASR or HPA, that suggests a very muddled product vision. Why would a customer invest big $ in HPA or LASR just to see processing spill to disk? Might as well stick with SAS/STAT.

I’m ignorant of the variety of SAS analytics tools, but it was represented to me in the context of “SAS enterprise miner” running co-located in a Hadoop setup, where enterprise miner was presented as a production ready big data analytics platform.

I saw your talk June last year, which touched on Hadoop MapReduce and HAP/LASR incompatibility. While these issues appear to be addressed, do you know if it is the case with enterprise miner?

You can host SAS Enterprise Miner on an edge node in Hadoop. To consume HDFS data, you will also need SAS/ACCESS Interface to Hadoop. To deploy your models back into Hadoop, you will also need SAS Scoring Accelerator for Hadoop. Last I checked, that is only supported on Cloudera.

There are several issues with this architecture:

(1) You need to license a bunch of SAS products. The minimum price for that configuration is $600K — just for the first year.

(2) Reading the data into SAS and then exporting model scores back into Hadoop builds latency into the analytic cycle. It’s fine if you have plenty of time to do your work, but it doesn’t fly for low-latency analytics.

(3) The amount of data you can actually analyze is constrained by the size of your edge node. SAS can spill to disk when it runs out of memory, but performance falls off a cliff when it does so.

Its a beautiful article.SAS people are projecting SAS VA in our company.If you share me the original copy it will be more helpful.I tried to message you through linkedin but its not happening.Please share the original pdf to vijayadvanz@gmail.com