In addition, the SAS System integrates with many SAS business solutions that enable large scale software solutions for areas such as human resource management, financial management, business intelligence, customer relationship management and more.

Contents
[show]

[edit] Description of SAS

SAS 8 on an IBM Mainframe under 3270 emulation SAS is driven by SAS programs that define a sequence of operations to be performed on data stored as tables. Although non-programmer graphical user interfaces to SAS exist (such as the SAS Enterprise Guide), most of the time these GUIs are just a front-end to automate or facilitate generation of SAS programs. SAS components expose their functionalities via application programming interfaces, in the form of statements and procedures. A SAS program is composed of three major parts, the DATA step, procedure steps (effectively, everything that is not enclosed in a DATA step), and a macro language. SAS Library Engines and Remote Library Services allow access to data stored in external data structures and on remote computer platforms. The DATA step section of a SAS program, like other database-oriented fourth-generation programming languages such as SQL or Focus, assumes a default file structure, and automates the process of identifying files to the operating system, opening the input file, reading the next record, opening the output file, writing the next record, and closing the files. This allows the user/programmer to concentrate on the details of working with the data within each record, in effect working almost entirely within an implicit program loop that runs for each record. All other tasks are accomplished by procedures that operate on the data set (SAS' terminology for "table") as a whole. Typical tasks include printing or performing statistical analysis, and may just require the user/programmer to identify the data set. Procedures are not restricted to only one behavior and thus allow extensive customization, controlled by mini-languages defined within the procedures. SAS also has

an extensive SQL procedure, allowing SQL programmers to use the system with little additional knowledge. There are macro programming extensions, that allow for rationalization of repetitive sections of the program. Proper imperative and procedural programming constructs can be simulated by use of the "open code" macros or the SAS/IML component. Macro code in a SAS program, if any, undergoes preprocessing. At runtime, DATA steps are compiled and procedures are interpreted and run in the sequence they appear in the SAS program. A SAS program requires the SAS System to run. Compared to general-purpose programming languages, this structure allows the user/programmer to be less familiar with the technical details of the data and how it is stored, and relatively more familiar with the information contained in the data. This blurs the line between user and programmer, appealing to individuals who fall more into the 'business' or 'research' area and less in the 'information technology' area, since SAS does not enforce (although SAS recommends) a structured, centralized approach to data and infrastructure management. The SAS System runs on IBM mainframes, Unix machines, OpenVMS Alpha, and Microsoft Windows; and code is almost transparently moved between these environments. Older versions have supported PC-DOS, the Apple Macintosh, VMS, VM/CMS, Data General AOS and OS/2.

[edit] Early history of SAS
SAS was conceived by Anthony J. Barr in 1966.[1] As a North Carolina State University graduate student from 1962 to 1964, Barr had created an analysis of variance modeling language inspired by the notation of statistician Maurice Kendall, followed by a multiple regression program that generated machine code for performing algebraic transformations of the raw data. Drawing on those programs and his experience with structured data files[2], he created SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr developed the fundamental structure and language of SAS. In January 1968, Barr and James Goodnight collaborated, integrating new multiple regression and analysis of variance routines developed by Goodnight into Barr's framework.[3][4] Goodnight's routines made the handling of basic statistical analysis more robust, and his later implementation (in SAS 76) of the general linear model greatly increased the analytical power of the system. By 1971, the SAS system was gaining popularity within the academic community. And by 1972, industry was making use of SAS. One strength of the system was analyzing experiments with missing data, which was useful to the pharmaceutical and agricultural industries, among others. In 1973, John P. Sall joined the project, making extensive programming contributions in econometrics, time series, and matrix algebra. Other participants in the early years

included Caroll G. Perkins, Jolayne W. Service, and Jane T. Helwig. Perkins made programming contributions. Service and Helwig created the early documentation.[3] In 1976, SAS Institute, Inc. was incorporated by Barr, Goodnight, Sall, and Helwig.

[edit] Components
This list is incomplete; you can help by expanding it. The SAS system consists of a number of components, which organizations separately license and install as required. SAS Add-In for Microsoft Office A component of the SAS Enterprise Business Intelligence Server, is designed to provide access to data, analysis, reporting and analytics for non-technical workers (such as business analysts, power users, domain experts and decision makers) via menus and toolbars integrated into Office applications. Base SAS The core of the SAS System is the so-called Base SAS Software, which is used to manage data. SAS procedures software analyzes and reports the data. The SQL procedure allows SQL programming in lieu of data step and procedure programming. Library Engines allow transparent access to common data structures such as Oracle, as well as pass-through of SQL to be executed by such data structures. The Macro facility is a tool for extending and customizing SAS software programs and reducing overall program verbosity. The DATA step debugger is a programming tool that helps find logic problems in DATA step programs. The Output Delivery System (ODS) is an extendable system that delivers output in a variety of formats, such as SAS data sets, listing files, RTF, PDF, XML, or HTML. The SAS windowing environment is an interactive, graphical user interface used to run and test SAS programs. SAS Enterprise Business Intelligence Server Includes both a suite of business intelligence (BI) tools and a platform to provide uniform access to data. The goal of this product is to compete with Business Objects and Cognos' offerings. Enterprise Computing Offer (ECO) Not to be confused with Enterprise Guide or Enterprise Miner, ECO is a product bundle. Enterprise Guide SAS Enterprise Guide is a Microsoft Windows client application that provides a guided mechanism to use SAS and publish dynamic results throughout an organization in an uniform way. It is marketed as the default interface to SAS for business analysts, statisticians, and programmers. Enterprise Miner A data mining tool. ETL Provides Extract, transform, load services.

SAS/ACCESS Provides the ability for SAS to transparently share data with non-native datasources. SAS/ACCESS for PC Files Allows SAS to transparently share data with personal computer applications including MS Access and Microsoft Office Excel. SAS/AF Applications facility, a set of application development tools to create customized applications. SAS/ASSIST Early point-and-click interface to the SAS system, has since been superseded by SAS Enterprise Guide. SAS/C SAS/CONNECT Provides ability for SAS sessions on different platforms to communicate with each other. SAS/DMI A programming interface between interactive SAS and ISPF/PDF applications. Obsolete since version 5. SAS/EIS A menu-driven system for developing, running, and maintaining an enterprise information systems. SAS/ETS Provides Econometrics and Time Series Analysis SAS/FSP Allows interaction with data using integrated tools for data entry, computation, query, editing, validation, display, and retrieval. SAS/GIS An interactive desktop Geographic Information System for mapping applications. SAS/GRAPH Although base SAS includes primitive graphing capabilities, SAS/GRAPH is needed for charting on graphical media. SAS/IML Matrix-handling SAS script extensions. SAS/INSIGHT Dynamic tool for data mining. Allows examination of univariate distributions, visualization of multivariate data, and model fitting using regression, analysis of variance, and the generalized linear model. SAS/IntrNet Extends SAS’ data retrieval and analysis functionality to the Web with a suite of CGI and Java tools SAS/LAB Superseded by SAS Enterprise Guide. SAS/OR Operations Research SAS/PH-Clinical

[edit] Terminology
Where many other languages refer to tables, rows, and columns/fields, SAS uses the forms data sets, observations, and variables respectively. This usage derives from its statistical heritage, and is shared by SPSS, another statistical package. There are only two kinds of variables in SAS, numeric and character (string). By default all numeric variables are stored as real. It is possible to reduce precision however. Date and datetime variables are numeric variables that inherit the C tradition and are stored as either the number of days (for date variables) or seconds (for datetime variables) from an epoch of 1960-01-01 00:00:00.

[edit] Features
This list is incomplete; you can help by expanding it.
• • •

• • • •

Read and write many different file formats. Process data in many different formats. SAS programming language is a 4th generation programming language. Actually it is a "3.5 GL" programming language. SAS DATA steps are written in a 3rdgeneration procedural language very similar to PL/I; SAS PROCS, especially PROC SQL, are non-procedural and therefore better fit the definition of a 4GL. Many built-in statistical and random number functions. Interaction with database products through SQL (and ability to use SQL internally to manipulate SAS data sets). Direct output of reports to CSV, HTML, PCL, PDF, PostScript, RTF, XML, and more using ODS. Interaction with the operating system (for example, pipelining on Unix and Windows and DDE on Windows).

• • • • • •

Fast development time, particularly from the many built-in procedures. Hundreds of built-in functions for manipulating character and numeric variables. An integrated development environment. Dynamic data-driven code generation using the SAS Macro language. Can process files containing millions of rows and thousands of columns of data. University research centers often offer SAS code for advanced statistical techniques, especially in fields such as Political Science, Economics and Business Administration.

[edit] Example SAS code
SAS uses data steps and procedures to analyze and manipulate data. By default, a data step iterates through each observation in a data set (sort of like every row in a SQL table). This data step creates a new data set BBB that includes those observations from data set AAA that had charges greater than 100.
data BBB; set AAA; if charge > 100; run;

Procedures that can summarize data are available in SAS. The proc freq procedure shows a frequency distribution of a given variable in a data set.
proc freq data=BBB; table charge; run;

SAS features a macro language, which can be used to generate SAS code. For instance, the above example could be re-used in many pieces of code by rewriting it as a macro:
%macro freqtable(table, variable); proc freq data = &table; table &variable; run; %mend freqtable; %freqtable(BBB, charge)

SAS also features SQL, which can be used to query SAS datasets or external database tables accessed with a SAS libname engine. For example, duplicate records could be extracted from a table for analysis:
proc sql; create table dup_recs from your_dataset d, (select count(*), id from your_dataset

group by id having count(*) > 1 ) t1 where d.id=t1.id; quit;

SAS has a useful feature where it can display the queried information. The proc print procedure is used for this:
proc print data=BBB; run;

[edit] Version history
This list is incomplete; you can help by expanding it.

[edit] SAS 71
SAS 71 was the first limited release of the system. The first manual for SAS was printed at this time, approximately 60 pages long[5]. The DATA step was implemented. Regression and analysis of variance were the main uses of the program.

[edit] SAS 72
This more robust release was the first to achieve wide distribution. It included a substantial user's guide, 260 pages in length[6]. The MERGE statement was introduced in this release, adding the ability to perform a database JOIN on two data sets[7]. This release also introduced the comprehensive handling of missing data[8].

[edit] SAS 76
SAS 76 was a complete system level rewrite, featuring an open architecture for adding and extending procedures, and for extending the compiler[9]. The INPUT and INFILE statements were significantly enhanced to read virtually all data formats in use on the IBM mainframe[10]. Report generation was added through the PUT and FILE statements[11]. The capacity to analyze general linear models was added[12].

For IBM mainframes, SAS 82 no longer required SAS databases be DSORG=DAU, because SAS 82 removed location-dependent information from databases. While this may seem trivial, it eliminated a major headache in administering SAS--that restoring a SAS disk-base database from tape no longer required restoring the entire volume, then copying the database to another location.

[edit] Version 4 series
In the early 1980s, SAS Institute released Version 4, the first version for non-IBM computers. It was written mostly in a subset of the PL/I language, to run on several minicomputer manufacturers' operating systems and hardware: Data General's AOS/VS, Digital Equipment's VAX/VMS, and Prime Computer's PRIMOS. The version was colloquially called "Portable SAS" because most of the code was portable, i.e., the same code would run under different operating systems.

[edit] Version 5 series [edit] Version 6 series
Version 6 represented a major milestone for SAS. While it was superficially similar to the user, the major change was "under the hood", where the software was rewritten. From its FORTRAN origins, followed by PL/I and mainframe assembly language; in version 6 the SAS System was rewritten in C, to provide enhanced portability between operating systems, as well as access to an increasing pool of C programmers compared to the shrinking pool of PL/I programmers. This was the first version to run on UNIX, MS-DOS and Windows platforms. The DOS versions were incomplete implementations of the Version 6 spec: some functions and formats were unavailable, as were SQL and related items such as indexing and WHERE subsetting. DOS memory limitations restricted the size of some user-defined items. The mainframe version of SAS 6 changed the physical format of SAS databases from "direct files" (DSORG=DA) to "flat files" (DSORG=PS,RECFM=FS). The practical benefit of this change is that a SAS 6 database can be copied from any media with any copying tool. In 1984 a project management component was added (SAS/OR?). In 1985 SAS/AF software, econometrics and time series analysis (SAS/DMI) component, and interactive matrix programming (SAS/IML) software was introduced. MS-DOS SAS (version 6.02) was introduced, along with a link to mainframe SAS. In 1986 Statistical quality improvement component is added (SAS/QC software); SAS/IML and SAS/STAT software is released for personal computers.

1987 saw concurrent update access provided for SAS data sets with SAS/SHARE software. Database interfaces are introduced for DB2 and SQL-DS. In 1988 MultiVendor Architecture (MVA) concept is introduced; SAS/ACCESS software is released. Support for UNIX-based hardware announced. SAS/ASSIST software for building user-friendly front-end menus is introduced. New SAS/CPE software establishes SAS as innovator in computer performance evaluation. Version 6.03 for MS-DOS is released. 6.06 for MVS, CMS, and OpenVMS is announced in 1990. The same year, the last MSDOS version (6.04) is released. Data visualization capabilities added in 1991 with SAS/INSIGHT software. In 1992 SAS/CALC, SAS/TOOLKIT, SAS/PH-Clinical, and SAS/LAB software is released. In 1993 software for building customized executive information systems (EIS) is introduced. Release 6.08 for MVS, CMS, VMS, VSE, OS/2, and Windows is announced. 1994 saw the addition of ODBC support, plus SAS/SPECTRAVIEW and SAS/SHARE*NET components. 6.09 saw the addition of a data step debugger. 6.09E for MVS. 6.10 in 1995 was a Microsoft Windows release and the first release for the Apple Macintosh. Version 6 was the first, and last series to run on the Macintosh. JMP, also produced by the SAS Institute, is the software package the company produces for the Macintosh. Also in 1995, 6.11 (codenamed Orlando) was released for Windows 95, Windows NT, and UNIX. 6.12 were Unix and Microsoft Windows releases (and more?) (Some of the following milestones in this sub-section may belong under version 7 or 8.) In 1996 SAS announces Web enablement of SAS software. Scalable performance data server is introduced. In 1997 SAS/Warehouse Administrator and SAS/IntrNet software goes into production. 1998 sees SAS introduce a customer relationship management (CRM) solution, and an ERP access interface — SAS/ACCESS interface for SAP R/3. SAS is also the first to

release OLE-DB for OLAP and releases HOLAP solution. Balanced scorecard, SAS/Enterprise Reporter, and HR Vision are released. First release of SAS Enterprise Miner. 1999 sees the releases of HR Vision software, the first end-to-end decision-support system for human resources reporting and analysis; and Risk Dimensions software, an end-to-end risk-management solution. MS-DOS versions are abandoned because of Y2K issues and lack of continued demand. In 2000 SAS shipped Enterprise Guide and ported its software to Linux.

[edit] Version 7 series
The Output Delivery System debuted in version 7; as did long variable names (from 8 to 32 characters); storage of long character strings in variables (from 200 to 32,767); and a much improved built-in text editor, the Enhanced Editor. Version 7 saw the synchronisation of features between the various platforms for a particular version number (which previously hadn't been the case). Version 7 was a precursor to version 8. It was believed SAS Institute released a snapshot from their development on version 8 to meet a deadline promise. SAS Institute recommended that sites wait until version 8 before deploying the new software.

[edit] Version 8 series
Released about 1999; 8.0, 8.1, 8.2 were Unix, Microsoft Windows, and z/OS releases. Key features: long variable names, Output Delivery System (ODS).

[edit] Version 9 series
In version 9, SAS Institute added the SAS Management Console, parallel processing, JavaObj, ODS OO (experimental as opposed to alpha), and National Language Support. Again the SAS Institute recommended sites delay deployment until 9.1. SAS Version 9 is running on Windows (32 & 64 bit), Unix (64 bit), Linux, and z/OS. SAS 9.1 was released in 2003. SAS 9.1.2 was released in 2004. SAS 9.1.3 was released in 2005. SAS 9.1.3 Service Pack 4 is the latest release (April 2006).

SAS 9.2 is the next release[1] and was demonstrated at the SUGI31 Conference in March 2006[2]. Possible release between March 2007 and June 2007. There are several important additions to base SAS in Version 9. The new hash object now allows functionality similar to the MERGE statement without sorting data or building formats. The function library was enlarged, and many functions have new parameters. Perl Regular Expressions are now supported, as opposed to the old "Regular Expression" facility, which was incompatible with most other implementations of Regular Expressions. Long format names are now supported.

[edit] Criticism
This section does not cite any references or sources.
Please help improve this section by adding citations to reliable sources. (help, get involved!) Unverifiable material may be challenged and removed. (tagged since August 2007)

SAS has been criticized for its relatively poor graphics when compared with other statistical software packages. With the release of an experimental extension to SAS 9.1, the graphics have improved significantly. The enhanced graphics are not provided by default, though, and usually get stored in many pieces. SAS has also been criticized for its costs, especially when compared to its open source competitors such as R SAS is considered to be several years behind competitor products when it comes to the implementation of statistical algorithms SAS has been criticised for its syntax unlike any other (popular) programming languages. Another criticism of SAS is the excessive amount of whitespace in its output.

2. ^ Barr contributed to the development of the NIPS Formatted File System while working for IBM at the Pentagon from 1964 - 1966. FFS was one of the first data management systems to take advantage of files with a defined structure for efficiencies in data storage and retrieval.

Nov 1, 2006: The life sciences industry is filled with challenges – it is quite common to see pharma companies battling—compliance issues, biotech majors concerned about clinical trial management and medical devices companies worrying about sales and distribution channels. HP (pre-merger Compaq), one of the first suppliers to enter the life sciences, recognized the growing need and business opportunity by forming a partnership with Celera way back in 1998. Under the agreement, HP supplied Celera with the computing power to unravel the human genome. A little later in 2000, IBM announced a $100 million life sciences initiative where it agreed to provide solutions in high-performance computing, infrastructure, data management, and integration. One year later it launched DiscoveryLink, the second prong of its life sciences initiative, focused on helping the typical discovery scientists.

Undoubtedly, the life sciences sector cannot succeed by itself and relies extensively on service providers. However, of all the service providers that support life sciences, IT is an indispensable enabler. IT solutions in storage, sales force automation, CRM, data management are the lifelines of most life sciences companies today. Hewlett Packard—First mover advantage HP’s IT solutions enable agility in the life sciences by aligning IT infrastructure with business requirements. The IT behemoth has assisted pharmaceutical and other life sciences organizations in addressing challenges such as identifying new drugs and healthcare products more quickly, reducing time to market for new drugs and healthcare products, and increasing supply chain responsiveness and agility. According to International Data Corporation (IDC), HP is the leading supplier of computing solutions for life sciences research. It has worked on the Human Genome Project with many of the major research centers worldwide, including the world’s largest genomic sequencing facilities at the Wellcome Trust Sanger Institute in the United Kingdom and Celera Genomics in the United States. HP’s expertise lies in the following areas: High performance computing: HP provides infrastructures for applications that support an indepth understanding of biological and chemical processes. The company specializes in grid technologies and how they can be designed to handle massive amounts of data in different formats and different locations, often owned by different companies Clinical trials: HP delivers services and solutions to augment the clinical trials process, including mobility solutions for remote data capture, document management system design, and implementation for regulatory submissions. These solutions reduce the time it takes to register and prove the efficacy and safety of new products. Sales force automation: HP’s sales force solutions improve coordination of face-to-face and webbased activities. They enhance the value of customer interaction by providing accurate data capture and feedback at the point of contact, as well as offering one cohesive view of the company to the customer. Demand chain management: HP’s demand chain solutions integrate demand-driven customerfacing processes with back office and supply chain systems such as enterprise resource planning (ERP) and supply chain management (SCM). These solutions include channel management, order management, transportation and logistics, demand planning and forecasting, and customer relationship management (CRM). IBM—Accelerating drug development IBM’s product portfolio includes solutions that accelerate drug discovery and drug development. Its solutions identify and rank development risks and ensure that companies breeze through compliance processes. The IBM BioPharmaceutical Solution enables life sciences companies to streamline business processes, support strategic growth and facilitate regulatory compliance. This solution helps drug development companies to:

• • • • • •

Deliver access, information, applications and services to specific employees based on their job responsibilities. Provide researchers with tools and information to help them collaborate more effectively. Help address the audit trail and electronic signature requirements outlined in 21 CFR Part 11. Track and document financial processes to comply with the Sarbanes-Oxley Act. Manage the transformation of data into knowledge to accelerate discovery. Make faster, more informed business decisions by consolidating information across business units.

Another offering from IBM for the pharma companies is the grid computing solution. This solution is designed to offer increased computing capacity by allowing access shared computing resources, enterprise-wide or outside the company.

This enables pharma companies to conduct compute-intensive projects more efficiently by tapping additional processing capacity, speed up drug discovery projects and utilize additional computing resources to help with a range of business needs. The IBM Global Electronic Data Capture solution is for life sciences that are actively involved in clinical trials. It helps companies re-engineer the clinical trial process to move compounds rapidly, safely and accurately through development and approval. This solution can reduce costs considerably by decreasing the amount of clinical research associate work and travel time as the data capture is electronic, and the data quality monitoring can be done remotely. It also enables companies to increase revenues by bringing new discoveries to market faster, clinical data can be rapidly integrated with other management and analysis systems to further reduce the amount of time required for clinical development. SAS—For faster commercialization of drugs SAS life sciences solutions optimize the flow of valuable scientific and operational data, helping companies in bringing their drugs and therapies to the market faster. The company provides end-toend solutions to drive efficiencies throughout every stage of a drug’s lifecycle: from discovery, through development, and commercialization. SAS solutions address the needs of each of these unique stages. Powered by SAS technology, these solutions are designed to work collaboratively, enabling companies to combine solutions. SAS has two key solutions for the life sciences companies — SAS Drug Development and CRM software. SAS Drug Development that provides a centralized, integrated system for managing, analyzing, reporting and reviewing clinical research information. The solution enables life sciences organizations to get better products to market faster by more effectively assessing the safety and efficacy of research compounds and by facilitating collaboration across trials, phases and therapeutic areas. Relying on its SAS 9 Intelligence Platform, SAS Drug Development specifically addresses the compliance requirements of life sciences organizations, while supporting your business objectives and aims to:

Its second solution is a CRM software package for the pharma companies that combines enterprise data management, advanced analytics, and campaign planning and management to synthesize customer data across all lines of business and customer touch points. SAP – Increasing efficiency though innovation SAP boasts of an impressive portfolio of software solutions, customized to address regulatory and business needs of life sciences organizations globally. Built on the open architecture of the SAP NetWeaver platform, these solutions provides a flexible, innovative platform that enables that enables companies to increase efficiency for the following business processes: