Developing reusable code and production processes

This case study was adapted from the SAS Global Forum 2010 paper: Creating Easily Reusable and Extensible Processes: Code that Thinks for Itself

It's easy to write code that answers only one need, but more challenging to develop a hands-off process that adapts to many needs. Focusing on the bigger picture when projects and requests come across your desk allows you to create flexible and extensible solutions that avoid maintainability issues and enable speed to market of results.

In creating a framework that consists of easily reusable and extensible code and processes, your aim should be a multi-layered, multi-component approach. The processes need to be adaptable and need to accommodate the presence or absence of any of the layers and some of the components.

The following are four steps for achieving that objective.

Step 1: Planning and requirements gathering

Before any code is written, investigate the needs, wants, wishes, goals and priorities of all stakeholders and key users. Ideally, both of these groups will be represented on the team, and gathering their input will help establish both the big picture and the details. Documenting and feeding this information back to the team will build consensus and maintain a record of principles.

Defining a hierarchy of components at this stage will aid in identifying which are required and which are optional. The process must flow smoothly whether or not the optional components are present. This will support a modular design and will also help break the project into manageable sub-tasks. For example, you might need information on the enterprise, division or project component.

It's important to distinguish the consistent from the distinctive elements: What will always be included, either at the enterprise level or the division level, and what aspects must always be customized for each division or project? You must also determine whether some of the customizations are filters that could be controlled by metadata or parameters or if they are truly unique requirements?

Identifying the intended recipients or users of each type of result at this early stage is helpful because it may expose details on what information should or shouldn't be included. Likely, the key users represent a cross-section, perhaps including business users, analysts and project planners. Each of these users will have different needs, which should be identified early.

The next step is to control the process. Can you anticipate what changes to the overall requirements are likely to occur in the future? If so, this should influence the design to maximize flexibility and ease maintainability.

The exploration of requirements should include a discussion of how you can deliver and validate your results. Delivering intermediate products for review and user acceptance during development will help structure the project and build respect and acceptance.

Step 2: Design

An effective design pattern is a blueprint for solving problems in a variety of situations and describes the process flow for a framework and how different components interact with each other.

2.a. Process and documentation

One of the keys to creating a reusable framework is separating the logic into compartments. Think of these compartments as little black boxes. They take something in (metadata, metric definitions) and give something back (data, reports). The users of the black box don't need to know anything about the inner workings, only what they input and get back in return.

An equally important part of preparing the framework is properly documenting it. Documentation can include such things as data definition tables, database table definitions, flow diagrams and reporting templates. The documentation should also outline how the various components communicate with each other to ensure a symbiotic relationship throughout.

Designing the framework to incorporate a metadata module from the beginning will save time now and in the future, as metadata will drive which sections of the framework are called upon and when. Metadata will not only help centralize conditional logic but will help with processing, scheduling, delivering and automating. Each project will have a set of definitions within metadata describing certain cues or functionality that the framework will handle.

Instead of manually checking what gets run when or where the results need to be delivered, you can place some of those conditions in metadata. This cuts down on quality-assurance processing and development work in the future and offers much more flexibility.

It is important to document the coding modules you anticipate building. They should follow similar naming conventions, storage locations and input/output styles.

2.b. Modularizing your code

The way you architect the framework will have a huge impact on the coding techniques and tools that will be utilized. From the design documentation and planning, you should have an idea of the different modules you will want to build, how they differ and how they will be similar.

There are many possible levels of code and algorithm modularization. When determining which to employ, you should consider code complexity, applicability for generalization and ease of use.

Let's explore two modularization techniques used in this project: driver/source programs and macro modules.

Consider one very common approach to program creation where you locate an existing program similar to the functionality you require, copy the code and store it as a new and unrelated program, start making changes to the code to tailor it to the current task and run and test the code as if it had never been used before. This results in a lot of lengthy programs that all may require changes as the business rules and data change.

Now consider the approach where you recognize that sections of your programs can easily be reused by just supplying the information that changes via macro variables or control datasets. You proceed to break the code into those sections and store them as separate, callable modules. This approach is what we call driver/source:

identify extensible code segments and store them as a single dated copy of each module (segment of source code);

document the input that each module requires; and

create a driver program that provides macro variable input for the current scenario and then calls the appropriate modules.

A repeatable process like this requires additional and extensive testing to confirm that changes to the source code modules accommodate all drivers that call the modules. But the payoff is that when business rules change, there is only one source program to revise rather than a multitude. And once the flexible process is set up, reuse of code is simple.

Assuming that you have generalized your processes into source modules, when you are ready to start reporting for a new project, just create a driver with necessary parameter values as input to your source modules.

Now that source modules have been created and drivers are in place to run the process, we need to consider further task-specific modularization.

It should not come as a surprise that the SAS® Macro Language will likely be the choice for creating reusable modules within a framework. Macros have always been a great way to generalize by building code and passing parameters through a process to influence the results. We can write generic code that can handle several situations rather than repeating code over and over again. Macro tools will allow us to build the robust and versatile framework we desire.

Macro variables serve a similar purpose and are a component of the macro language. They can be used to feed information into code that only differs by certain parameters and can be used for passing down instructions and conditions from metadata at a global level within your framework. This will allow your code to be very generic and mutate when we instruct it to do so. Using metadata coupled with macros will truly enable hands-off processing, and through several iterations of code it will become obvious as to where these techniques can be implemented.

Macros can also be used to control which process and logic gets executed. Metadata coupled with macro logic can help execute code based on certain conditions.

2.c Managing results

An important and often overlooked aspect of the planning process involves results management. Results may include logs, datasets, reports, spreadsheets and validation files. Effective naming conventions and storage locations help ensure easy retrieval of results and appropriate cross-referencing back to the code that generated the results.

The associated log filename should include the SAS program name, for easy reference, along with the date time stamp corresponding to when the program ran.

To automatically create the log file with each run of a program, use PROC PRINTTO to begin log writing at the top of the SAS program. We recommend further generalization of the program by using macro variables for the filename and file pathing to enable easy changes and usage for other purposes within the program (such as naming other files).

Step 3: Testing and quality assurance

A highly modularized design lends itself to highly modularized testing and quality assurance (QA). Unit testing will ensure that your modules receive metadata and produce a uniform result regardless of the values of the metadata.

Essentials of the QA are:

analysis of the contents of interim datasets that can easily be related back to the data requirements;

review of SAS logs from running each of the component modules;

walkthroughs of program logic to ensure that what has been implemented matches what was in your detailed requirements; and

analysis of results in the form of analytical datasets and/or reports.

Step 4: Rollout

Once you've created the framework that will encompass your modules, you can start to focus on how to bring everything together in delivering the final product.

You must consider how the new data requests will be handled, how and when jobs will be scheduled, how results will get sent to users and how to automate it all.

Metadata will be the driving force in addressing ways to handle the rollout process. It's used to help describe the processes you're running and will be the gatekeeper to managing their execution and delivery.

Imagine trying to determine which processes need to run monthly by working through a list of jobs, checking start and end dates. This can be very tedious, not to mention prone to human error. New data requests will involve the creation of new driver programs. This can be as simple as altering the project identifier so that when the driver is executed it can refer back to the metadata that describes the execution rules for that project. This type of processing really simplifies the creation of new data or report delivery, since metadata drives the processing.

Delivering the data to end-users also can be driven from metadata cues. You can set up delivery locations or methods conditionally, based on this metadata.

A logical extension from generating a driver program for each unique project would be to have one super-driver program that invoked the relevant processes for the active projects for each reporting period. The examples that we have discussed can be taken a level higher, where your driver programs would be automatically generated and executed based on metadata. This would mean your metadata would need to be fairly extensive in describing all the rules and relationships needed to work properly. A master program could take metadata and generate the conditional calls and rules to run the processes, in addition to when to run them. We have the project Start and End dates, so we know when the framework needs to be invoked for each project. Divisional indicators can describe the people who receive results and where they receive them. Project definitions would individualize the data to the requestors' needs. These metadata elements will help drive the use of the modular items created and designed in earlier sections.

Conclusion

Building reusable and extensible code requires planning and discipline, but the benefits outweigh the efforts. Once the framework is in place and metadata is available for a project, results can be delivered rapidly with little effort. By defining and utilizing minimally acceptable inputs and results, even if a component isn't available, the other components are delivered to provide immediate value. Code modularization not only helps reusability but chunks the logic into digestible pieces. Documenting and sharing knowledge about short, focused modules is far easier than doing it for thousands of lines of code in one chunk.

Keep in mind that a repeatableprocess like this requires extensive testingto confirm that changes to the source code modules accommodate all drivers that call the modules. In the long run, time and effort is saved by building a robust modular process with wide applicability.

Challenge

Solution

Strategies for developing reusable and extensible code and processes, including the SAS® Macro Language.

Benefits

Creating short, focused code modules saves both time and effort in applications development and is more efficient to document.

The results illustrated in this article are specific to the particular situations, business models, data input, and computing environments described herein. Each SAS customer’s experience is unique based on business and technical variables and all statements must be considered non-typical. Actual savings, results, and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty statements in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software. Brand and product names are trademarks of their respective companies.