The Importance of Perl

Despite all the press attention to Java and ActiveX, the real job of
"activating the Internet" belongs to Perl, a language that is all but
invisible to the world of professional technology analysts but looms
large in the mind of anyone -- webmaster, system administrator or
programmer -- whose daily work involves building custom web applications
or gluing together programs for purposes their designers had not quite
foreseen. As Hassan Schroeder, Sun's first webmaster, remarked: "Perl is
the duct tape of the Internet."

Perl was originally developed by Larry Wall as a scripting language for
UNIX, aiming to blend the ease of use of the UNIX shell with the power and
flexibility of a system programming language like C. Perl quickly became
the language of choice for UNIX system administrators.

With the advent of the World Wide Web, Perl usage exploded. The Common
Gateway Interface (CGI) provided a simple mechanism for passing data
from a web server to another program, and returning the result of that
program interaction as a web page. Perl quickly became the dominant
language for CGI programming.

With the development of a powerful Win32 port, Perl has also made significant
inroads as a scripting language for NT, especially in the areas of system
administration and web site management and programming.

For a while, the prevailing wisdom among analysts was that CGI programs--and
Perl along with them--would soon be replaced by Java, ActiveX and other
new technologies designed specifically for the Internet. Surprisingly,
though, Perl has continued to gain ground, with frameworks such as
Microsoft's Active Server Pages (ASP) and the Apache web server's mod_perl
allowing Perl programs to be run directly from the server, and interfaces
such as DBI, the Perl DataBase Interface, providing a stable API for
integration of back-end databases.

This paper explores some of the reasons why Perl will become increasingly
important, not just for the web but as a general purpose computer
language. These reasons include:

fundamental differences in the tasks best performed by scripting
languages like Perl versus traditional system programming languages like
Java, C++ or C.

Perl's ability to "glue together" other programs, or transform the output
of one program so it can be used as input to another.

Perl's unparalleled ability to process text, using powerful features like
regular expressions. This is especially important because of the
re-emergence via the web of text files (HTML) as a lingua-franca across
all applications and systems.

The ability of a distributed development community to keep up with rapidly
changing demands, in an organic, evolutionary manner.

A good scripting language is a high-level software development language
that allows for quick and easy development of trivial tools while having
the process flow and data organization necessary to also develop complex
applications. It must be fast while executing. It must be efficient when
calling system resources such as file operations, interprocess
communications, and process control. A great scripting language runs on
every popular operating system, is tuned for information processing (free
form text) and yet is excellent at data processing (numbers and raw, binary
data). It is embeddable, and extensible. Perl fits all of these criteria.

When and Why a Scripting Language?
As John Ousterhout has elegantly argued in his paper,
Scripting: Higher Level Programming for the 21st
Century,
"Scripting languages such as Perl and Tcl represent a very different style
of programming than system programming languages such as C or Java.
Scripting languages are designed for 'gluing' applications; they use
typeless approaches to achieve a higher level of programming and more
rapid application development than system programming languages. Increases
in computer speed and changes in the application mix are making scripting
languages more and more important for applications of the future."

Ousterhout goes on:

As we near the end of the 20th century a fundamental change
is occurring in the way people write computer programs. The
change is a transition from system programming languages
such as C or C++ to scripting languages such as Perl or
Tcl. Although many people are participating in the change,
few people realize that it is occurring and even fewer
people know why it is happening....

Scripting languages are designed for different tasks than
system programming languages, and this leads to fundamental
differences in the languages. System programming languages
were designed for building data structures and algorithms
from scratch, starting from the most primitive computer
elements such as words of memory. In contrast, scripting
languages are designed for gluing: they assume the
existence of a set of powerful components and are intended
primarily for connecting components together. System
programming languages are strongly typed to help manage
complexity, while scripting languages are typeless to
simplify connections between components and provide rapid
application development.

Scripting languages and system programming languages are
complementary, and most major computing platforms since the
1960's have provided both kinds of languages. However,
several recent trends, such as faster machines, better
scripting languages, the increasing importance of graphical
user interfaces and component architectures, and the growth
of the Internet, have greatly increased the applicability
of scripting languages. These trends will continue over the
next decade, with scripting languages used for more and
more applications and system programming languages used
primarily for creating components.

System administrators were among the first to capitalize on the power of
scripting languages. The problems are everywhere, on every operating system.
They usually appear as the requirement to automate repetitive tasks. Even
Macintosh operating systems need some user definable automation. It might be
as simple as an automated backup and recovery system, or as complex as a
periodic inventory of all the files on a disk, or all the system configuration
changes in the last 24 hours. Many times, there are existing utilities that
do part of the work, but automation requires a more general framework for
running programs, capturing or transforming their output, and coordinating
the work of multiple applications.

Most systems have included some form of scripting language. VMS's DCL,
MS-DOS's .BAT files, UNIX's shell scripts, IBM's Rexx, Windows' Visual Basic
and Visual Basic for Applications, and Applescript are good examples of
scripting languages that are specific to a single operating system. Perl is
fairly unique in that it has broken the tight association with a single
operating system and become widely used as a scripting language on multiple
platforms.

Some scripting languages, most notably Perl and Visual Basic, and to a
lesser extent Tcl and Python, have gained wide use as general purpose
programming languages. Successful scripting languages distinguish themselves
by the ease with which they call and execute operating system utilities and
services. To reach the next level, and function as general purpose
languages, they must be robust enough that you can build entire complex
application programs. The scripting language is used to prototype, model,
and test. If the scripting language is robust and fast enough, the prototype
evolves directly into the application.

So why not use a general purpose programming language like C, C++ or Java
instead of a scripting language? The answer is simple: Cost. Development
time is more expensive than fast hardware and memory. Scripting languages
are easy to learn, and simple to use.

As Ousterhout points out, scripting languages typically lack data types.
They don't distinguish between integer and floating point numbers.
Variables are typeless. This is one of the ways that scripting languages
speed up development. The concept is to "leave the details for later."
Since scripting languages are generally good at calling system utilities
to do the dirty work, for instance, copying files and building directories
or file folders, the details can be handled by some small utility that,
if it doesn't exist and is necessary, will be easy to write in a compiled
language.

What do those data types do for compiled languages? They make memory
management easier for the system, but harder for the programmer. Think
about this: How much did a programmer make an hour when FORTRAN was on
the ascendant? How much did memory cost then? How about now? Times
have changed. Memory is cheap; programmers are expensive!

System languages need to have everything spelled out. This makes
compilation of complex data structures easier, but programming harder.
Scripting languages make as many assumptions as they can. As little as
possible needs to be spelled out. This makes the scripting language easier
to learn and faster to write in. The price to be paid is difficulty in
developing complex data structures and algorithms. Perl, however, is good
at both complex data structures and algorithms, without sacrificing ease
of use for simple applications.

Interpreted vs. Compiled Languages

Most scripting languages are interpreted languages, which contributes to
the perception that they may be inappropriate for large scale programming
projects. This perception needs to be addressed.

With the exception of language specific hardware, it is true that interpreted
programs are slower than compiled languages. The advantage of interpreted
languages is that programs written in that language are portable to any
system that the interpreter will run on. The system-specific details are
handled by the interpreter, not by the application program.
(There are always exceptions to this rule. For example, the
application program may explicitly use a non-portable system resource.)

Operating system command interpreters such as MS-DOS's command.com and
early versions of the UNIX C shell are good examples of how interpreters
work: each command line is fed to the interpreter as it occurs in the
script. The worst blow to efficiency is in any looping; each line in the
loop is reinterpreted every time it is run. Some people think that all
scripting languages work like this... slowly, inefficiently, a line at
a time. This is not true.

However, there are middle languages, languages that are compiled to some
intermediate code which is loaded and run by an interpreter at run time.
Java is an example of this model; this is what will make Java a valuable
a cross platform application language. All the Java interpreters on
different hardware will be able to communicate and share data and process
resources. This is perfect for embedded systems, where each device is
actually a different kind of special purpose hardware. Java is not a
scripting language, however. It requires data declarations. It is compiled
ahead of time (unless you count Just-In-Time compilation -- really just
code generation -- as part of the process).

Perl is also a middle language. Blocks of perl are compiled as needed,
but the executable image is held in memory instead of written to a file.
The compilation only happens once for any block of the perl script. The
advantages of Perl's design make all this optimization work worth while.
Perl maintains the portability of an interpreted language while achieving
nearly the speed of a compiled language. Perl, nearly a decade old, with
hundreds of thousands of developers, and now in its fifth incarnation, runs
lean and fast. There is some amount of startup latency, as the script is
initially compiled, but this is typically small relative to the overall
performance of the script. In addition, techniques such as "fast CGI",
which keeps the image of a frequently accessed CGI script in memory
for repetitive re-execution, avoids this startup latency, except on the
very first execution of a script.

In any event, Perl 5.005 will include a compiler, created by Malcolm
Beattie of Oxford University. The compiler eliminates the startup latency
of in-process compilation, and adds some other small speed-ups as well.
It also addresses the psychological barrier programmers of commercial
applications sometimes experience with respect to interpreted languages.
(With a compiled language, the source code is no longer available for
inspection by outside parties.)

Information Processing versus Data Processing

The World Wide Web is only one instance of a fundamental change in how
we interact with computers. This change is visible in the very name we
now give the industry. It used to be called "Data Processing," as in
"I'll have to submit my job to the data processing center at 4 AM so that
I can pick up my output before noon." Now we call it "Information Services"
as in "the Director of Information Services is working with our planning
committee." The interest and emphasis is now on "information" not "data."
It is clear there is more interest in information, which typically includes
a mix of text and numeric data, rather than just data. Perl excels at
handling information.

An important part of Perl's information-handling power comes from a special
syntax called regular expressions. Regular expressions give Perl enormous
power to perform actions based on patterns that it recognizes in a body of
free form text. Other languages support regular expressions as well (there
is even a freeware regular expression library for Java), but no other
language integrates them as well as Perl.

For many years, the trend was to embed text in specialized application file
formats. Except for UNIX, which explicitly specified ASCII text as a
universal file format for exchange between cooperating programs, most
systems allowed incompatible formats to proliferate. This trend was
reversed sharply by the World Wide Web, whose HTML data format consists of
ASCII text with embedded markup tags. Because of the importance of the web,
HTML -- and ASCII text with it -- is now center stage as an interchange
format, exported by virtually all applications. There are even plans by
Microsoft to provide an HTML view of the desktop. A successor to HTML, XML
(eXtensible Markup Language) is widely expected to become a standard way of
exchanging data in a mixed environment.

The increasing prominence of HTML plays directly to Perl's strengths. It
is an ideal language for validating user input in HTML forms, for
manipulating the contents of large collections of HTML files, or for
extracting and analyzing data from voluminous log files.

That is only one side of the text processing power of Perl. Perl not only
gives you several ways to pick data apart, but also several ways to glue
data back together. Perl is thus ideal for taking apart an information
stream and reconfiguring it. This can be done on the fly as a way of
transforming information into input to other programs or for analysis and
reporting.

One can argue that the next generation of computer applications will not
be traditional software applications but "information applications", in
which text forms a large percentage of the user interface. Consider the
classic "Intranet" web application: a human resources system through which
employees can choose which mutual funds in which to invest their retirement
savings, track the performance of their account, and access information
that helps them to make better investment decisions. The interface to such
a system consists of a series of informational documents (typically
presented as HTML), a few simple forms-based CGI scripts, and links to
back-end systems (which may be outside services accessed via the Internet)
for real-time stock quotes.

To build an application like this using traditional software techniques
would be impractical. Each company's mix of available investments is
unique; the application would not justify the amount of traditional
programming required for such a localized application. Using the web as
a front end, and perl scripts as a link to back end databases, you are
essentially able to create a custom application in a matter of hours.

Or consider Amazon.com, perhaps the most visibly successful new web
business. Amazon provides an information front-end to a back-end database
and order-entry system, with, you guessed it, Perl, as a major component
tying the two together.

Perl access to databases is supported by a powerful set of
database-independent interfaces called DBI. Perl + fast-cgi + DBI is
probably the most widely used "database connector" on the web. ODBC
modules are also available.

Put together Perl's power to handle text on the front end, and connect
to databases on the back end, and you begin to understand why it will play
an increasingly important role in the new generation of information
applications.

Other applications of Perl's ability to recognize and manipulate text
patterns include biomedical research and data mining. Any large text
database, from the gene sequences analyzed by the Human Genome Project to
the log files collected by any large web site, can be studied and manipulated
by Perl. Finally, Perl is increasingly being used for applications such as
network-enabled research and specialized Internet search applications. Its
strength with regular expressions and facility with sockets, the
communications building block of the Internet, have made the language of
choice for building Web robots, those programs that search the Internet for
information.

Perl for Application Development

Developers are increasingly coming to realize Perl's value as an
application development language. Perl makes it possible to realistically
propose projects that would be unaffordable in the traditional system
programming languages. Not only is it fast to build applications with Perl,
but they can be very complex, even incorporating the best attributes of
object-oriented programming if necessary.

It is easier to build socket-based client-server applications with Perl
than with C or C++. It more efficient to build free text parsing
applications in Perl than any other language. Perl has a sophisticated
debugger (written in Perl), and many options for building secure
applications. There are publicly available Perl modules for every sort
of application. These can be dynamicly loaded as needed.

Perl can be easily extended with compiled functions written in C/C++ or
even Java. This means that it is easy to include system services and
functions that may not already be native to Perl. This is particularly
valuable when working on non-UNIX platforms since the special attributes
of that operating system can be included in the Perl language.

Perl can also be called from compiled applications, or embedded into
applications written in other languages. Efforts are underway, for instance,
to create a standard way to incorporate Perl into Java, such that Java
classes could be created with Perl implementations. Currently, such
applications must embed the Perl interpreter. A new compiler back-end,
to be available in fourth quarter 1997 in O'Reilly & Associates' Perl
Resource Kit, will remove this obstacle, allowing some Perl applications
to be compiled to Java byte-code.

Graphical Interfaces

Because it was originally developed for the UNIX environment, where the
ASCII terminal was the primary input/output device (and even windowing
systems such as X preserved the terminal model within individual windows),
Perl doesn't define a native GUI interface. (But in today's fragmented GUI
world this can be construed as a feature.) Instead, there are Perl extension
modules for creating applications with graphical interfaces. The most
widely used is Tk, which was originally developed as a graphical toolkit for
the Tcl scripting language, but which was soon ported to Perl. Tcl is
still specific to the X Window System, though it is currently being ported
to Microsoft Windows.

However, as noted earlier, the development of native windowing applications
is becoming less important as the web becomes the standard GUI for many
applications. The "webtop" is fast replacing the "desktop" as the universal
cross-platform application target. Write one Web interface and it works on
UNIX, Mac, Windows/NT, Windows/95...anything that has a Web browser.

In fact, an increasing number of sites use Perl and the Web to create new
easier-to-use interfaces to legacy applications. For example, the Purdue
University Network Computing Hub provides a web-based front-end to more than
thirty different circuit simulation tools, using Perl to interpret user
input into web forms and transform it into command sequences for programs
connected to the hub.

Multithreading

Threads are a desireable abstraction for doing multiple and concurrent
processing, particularly if you are programming for duplex communications
or event driven applications. A multi-threading "patch" to Perl has been
available since early 1997; it will be integrated into the standard
distribution as of Perl version 5.005, in the fourth quarter.

The multitasking model that Perl has historically supported is "fork"
and "wait." The granularity is the process. The flavor is UNIX.
Unfortunately, the Windows/NT equivalent isn't quite the same. This is
where the portability of Perl breaks down, at least for now. By building
cross-platform multi-process Perl applications with a layer of abstraction
between the process control and the rest of the application, the problems
can be avoided. Furthermore, work is underway, to be completed in the
fourth quarter of 1997, to reconcile the process-control code in the
UNIX and Win32 ports of Perl.

Perl on Win32 Systems

In 1996, Microsoft commissioned ActiveWare Internet Corporation
(now ActiveState Tool Corp) to create a port of Perl to Win32 for inclusion
in the NT Resource Kit. That port has since become widely available on
the net, and reportedly, nearly half of all downloads of the Perl source
code are for the Win32 platform.

Perl has taken off on Win32 platforms such as NT for several reasons.
Despite the presence of Visual Basic and Visual Basic for Applications,
native scripting support on Win32 is relatively weak. While VB is an
interpreted scripting language, it is still a typed language, which makes
it somewhat more cumbersome to use. It also lacks the advanced
string-handling capabilities that are so powerful in Perl. As efforts are
underway to create larger-scale NT sites, the limitations of Graphical
User Interfaces quickly become evident to administrators; scripting is
essential for managing hundreds or thousands of machines.

It is not insignificant that many of the experienced administrators being
called on to manage those sites cut their teeth on UNIX. Using Perl is a
good way to bring the best of UNIX with you to other platforms.

Nor can you underestimate the drawing power of the web. As thousands
of Perl-based CGI programs and site management tools are now available,
Perl-support is essential for any web server platform. As NT-based web
servers from Microsoft, O'Reilly and Netscape become a more important
part of the web, Perl support is essential. In particular, ActiveState's
PerlScript(tm) implementation allows Perl to be used as an active
scripting engine on NT web servers such as Microsoft's IIS and O'Reilly's
WebSite that support the Active Server Pages (ASP) technology.

In addition to the core Perl language interpreter, the ActiveState Perl
for Win32(tm) port includes modules specifically targetted to the Win32
environment. For example, it provides full access to Automation objects.
As more and more system resources and components support that interface
under Windows, more aspects of the operating system are directly accessible
by Perl for Win32.

Extending the Power of Perl

Unlike languages such as Microsoft's Visual Basic or Sun's Java, Perl
does not have a large corporation behind it. Perl was originally developed
by Larry Wall and made available as freeware. Larry is assisted in the
further development of Perl by a group of about 200 regular contributors
who collaborate via a mailing list called perl5-porters. The list was
originally focussed on porting Perl to additional platforms, but gradually
became the center for those adding to the core language.

In addition, Perl 5 includes an extension mechanism, by which independent
modules can be dynamically loaded into a Perl program. This has led to the
development of hundreds of add-in modules. Many of the most important
modules have become part of the standard Perl distribution; additional
modules are available via the Comprehensive Perl Archive Network (CPAN).
The best entry point to the CPAN is probably the www.perl.com site, which
also includes book reviews, articles, and other information of interest to
Perl programmers and users.

While there has been a historical bias against using freeware for mission
critical applications, this bias is crumbling rapidly, as it becomes widely
recognized that many of the most significant computing advances of the past
few decades have been developed by the freeware community. The Internet
itself was largely developed as a collaborative freeware project, and its
further development is still guided by a self-organizing group of visionary
developers. Similarly, the leading web server platform in terms of market
share, by a large margin, is Apache--again, a free software project created,
extended and managed by a large collaborative developer community.

In addition to ongoing development, the Perl community provides active
support via newsgroups and mailing lists. There are also numerous
consultancies and paid support organizations. Excellent documentation is
provided by numerous books, including most notably Programming Perl, by
Larry Wall, Randal Schwarz and Tom Christiansen. The Perl Journal and
www.perl.com provide information about the latest developments.

In short, because of the large developer base and the cooperative history
of the freeware community, Perl has access to development and support
resources matching those available to the largest corporations.

Application Stories

The following section includes a selection of user application stories,
ranging from the quick and dirty "Perl saves the day" applications familiar
to so many system administrators, to larger custom applications. Some of
these application stories are taken from presentations at the first annual
Perl Conference, held in San Jose, CA from August 19-21, 1997. The
application descriptions from the conference proceedings are labeled
with the names of their authors.

Ok, so here's the situation. Your brand new exciting Internet company has
taken off and you're selling more browsers, servers, and web applications
than you ever hoped for, your company is growing by leaps and bounds, and
the latest market information says that your customer base has just past
the 30 million mark in less than a year.

And the only downside is that these 30 million folks might have a few
problems with their browser; they might not know exactly what the Internet
is; they might want to call someone for support. They might want to call
*you* for technical support.

So, when this happens, you might think, "That's ok I'll just put some
technical articles out on the web." But when you first look at the project,
you realize that you're going to need some sort of Content Management
System, some sort of Distribution system, some logging analysis, and
gathering and reporting of feedback of your customers on your site. And
you're going to want it yesterday.

Lucky for you, you know Perl. And with Perl you're able to get all of this
built in 3 months in the spare time of 4 very busy technical support
engineers.

Case 2 - A Quick and Dirty Conversion at BYTE

BYTE Magazine used to maintain its own information network and
conferencing system, BIX, that both editors and readers used for
exchanging ideas. The conferencing model was quite different from Usenet,
somewhat closer to a mail-list. Since several of the BYTE editors were
regular Usenet subscribers and preferred that model, BYTE built a gateway
that translated and maintained the BIX editorial discussion groups as a
private Usenet news group. The language was Perl. It took little more than
a hundred lines of code and a few days of work.

Case 3 - Routing customer inquiries to appropriate experts

The performance testing group at one of the world's leading computer
companies needed to automate query routing. They were directed to use
their world-wide corporate Intranet, but not given any budget to do the
project. Two engineers with only a few weeks of Perl experience created
a solution. The Perl scripts responded to the query by matching key
elements of queries with people with that expertise. The CGI programs
not only pointed the client to the experts' Web-pages and E-mail
addresses, but also passed the query on to all appropriate experts in their
E-mail. The solution took no more than a few man-weeks and so could be
asorbed into other budgets.

Case 4 - Collection and analysis of email survey data

An Internet market research firm that does its research using an E-mail
survey wants to automate and generalize the handling of the anticipated
ten thousand responses. Perl was used to automate the process. The Perl
script generated input for SPSS, but would have been capable of doing
statistical analysis if the statistician had known Perl.

Case 5 - A Cross-Platform Harness for Running Benchmarks

SPEC (the Standard Performance Evaluation Corporation), a industry
consortium for benchmarking computer systems, radically changed the
governing program when the SPEC92 benchmarks evolved to SPEC95. SPEC
wanted to make it possible for their benchmarks to run other operating
systems than UNIX without a major effort. The SPEC92 benchmarks were
managed by UNIX shell scripts, unportable and inflexible. The SPEC95
benchmarks are managed by a portable, extensible engine written in Perl.
The scripts take advantage of Perl's object oriented capabilities, Perl's
extensibility with C, and Perl's dynamic module loading. Porting SPEC95 to
Windows/NT was simple. The major problem with porting to VMS is its lack
of user level forks.

Case 6 - Consultant working with Perl

Despite the years that I have spent developing in C, I have found little
reason to continue to do so. Most of my work in the last ten years has
been developing code that retrieves, manages, and converts information, not
just data. The application programs I am involved in are merely graphical
controls front-ending information retrieval, management, and conversion
engines. Perl now fills the need for this kind of development better than
any other language--scripting or system programming language. Even though I
started using Perl merely as a glue scripting language and prototyping
language, I now use it for everything. It has replaced both C and my UNIX
shell programs. There will be times, I am sure, that I will have to write,
or at least patch, a program in C. I expect that Java will eventually fill
those requirements for me.

Cross-platform GUI interfaces are now done in HTML and run locally, in an
Intranet, or as part of the Web.

Perl provides me with fast indexing to simple data structures and modules
for talking to commercial databases. It provides me with system level
tools for process management, file management, and interprocess
communications wherever sockets are understood. It allows me to design
my applications using libraries, modules, packages, and subroutines. It
allows me to write applications that modify themselves; scary as that may
seem, it is sometimes necessary.

The greatest benefit of Perl to me is that I can build solutions to complex
problems in a fifth the time. This appeals to managers and clients, but
particularly to the people paying the bills.

Because of its robustness and flexibility, Perl has become the language of
choice by many programmers in CAASD for developing rapid-prototypes of
concepts being explored. The Traffic Flow Management Lab (T-Lab) has
implemented hundred of Perl programs that range from simple data parsing
and generating plots, to measuring the complexity of regions of airspace
and calculating the transit times of aircraft over these regions. The size
of these applications range from about 10 lines to over 1200. Because many
of the applications are very I/O intensive, Perl became the natural choice
with its many parsing and searching features.

Case 8 - Online Specialty Printing
Dave Hodson (dave@iprint.com)

The iPrint Discount Printing & CyberStationery Shop (http://www.iPrint.com)
is powered by a WYSIWYG, desktop publishing application on the Internet
directly connected into a backend printer and sits on top of a
sophisticated, real-time, multi-attributed product and pricing database
technology. Customers come to our site to create, proof, and order
customized popularly printed items--business cards, stationery, labels,
stamps, specialty advertising items, etc online.

The iPrint system includes both a front-end (the website) and a back-end
process that eliminates nearly all of the manual pre-flight process that
printers perform and also provides all pertinent information to iPrint's
accounting system. 95% of the approximately 80,000 lines of code used
to perform this work is done using Perl v 5.003 with WinNT 4.0 OS. iPrint
relies heavily on RDBMS (SQL Server) with all database interaction being
performed by Perl and ODBC. iPrint uses many modules from the CPAN
archives, including MIME and Win32::ODBC.

Amazon.com used Perl to develop a CGI-based editorial production system
that integrates authoring (with Microsoft Word or Emacs), maintenance
(version control with CVS and searching with glimpse), and output
(with in-house SGML tools).

Writers use the CGI application to start an SGML document. They fill out
a short form and then it generates a partially completed SGML document
in the user's home directory, which may be mounted on their Microsoft
Windows PC. The writer then uses their favorite editor to finish the
document. With the CGI application, users see changes ('cvs diff') and
their SGML rendered as HTML before submitting their document ('cvs
commit'). Writers can do keyword searches of the SGML repository (by way
of glimpse) and track changes ('cvs log'). Editors can also schedule
content with the CGI application.

Amazon.com created a base SGML renderer class that is sub-classed to
render different sections of the web site in different modes (html with
graphics and html without graphics, and in the future, PointCast, XML,
braille, etc).

All of the code is in Perl. It uses the CGI and HTML::Parser modules.

Case 10 - Specialty Print Servers at a New England Hospital

A major New England hospital uses twelve operating systems, from mainframes
to desktop PCs. It has seven different network protocols. There are roughly
twenty thousand PC workstations and two thousand printers of one type and
one thousand speciality printers. The network is spread over an entire
city using microwave, T1, T3, and private optical fiber. The problem is
network printing. Specialty printers are required because the patient
registration and billing system runs on IBM and Digital mainframes, the
output going through their proprietary networks. The goal is to have all
of the operating systems able to print to a standard printer through a
standard protocol.

A search for appropriate scalable printer servers uncovered the MIT Project
Athena's Palladium as a good starting point. However, its model of
standalone print servers didn't fit. The hospital needed a distributed
server model. When a two month effort to port Palladium to the hospital
platform so that we could make the changes proved that it was not going
to be economical, we decided to build exactly what we wanted in fast
prototyping languages: Perl for the core application and Tcl/Tk for the
GUI administrative interface. Palladium represents 30,000 lines of C. The
more complex distributed server model required only 5,000 lines of Perl
and only four man-months to achieve a first release. The Perl proved
sufficiently fast on a 60MHz Pentium running a UNIX variant that no
code required rewriting in C.

In the future, computing may operate on a network-based and
service-oriented model much like today's electricity and telecommunications
infrastructures. This vision requires an underlying infrastructure capable
of accessing and using network-accessible software and hardware resources
as and when required. To address this need, we have developed a
network-based virtual laboratory ("The Hub") that allows users to access
and run existing software tools via standard world-wide web (WWW)
browsers such as Netscape.

The Hub, a WWW-accessible collection of simulation tools and related
information, is a highly modular software system that consists of
approximately 12,000 lines of Perl5 code. It has been designed to: a) have
a universally-accessible user-interface (via WWW browsers), b) provide
access-control (security and privacy) and job-control (run, abort, and
program status functions), and c) support logical (virtual)
resource-organization and management. The Hub allows users to: a) upload
and manipulate input-files, b) run programs, and c) view and download
output - all via standard WWW browsers. The infrastructure is a distributed
entity that consists of a set of specialized servers (written in Perl5)
which access and control local and remote hardware and software resources.
Hardware resources include arbitrary platforms, and software resources
include any program (the current implementation does not support interactive
and GUI-based programs).

The Hub allows tools to be organized and cross-referenced according to
their domain. Resources can be added incrementally using a
resource-description language specifically designed to facilitate the
specification of tool and machine characteristics. For example, a new
machine can be incorporated into the Hub simply by specifying its
architecture (make, model, operating system, etc.) and starting a server
on the machine. Similarly, a new tool can be added by "telling" the Hub
the tool's location, its input behavior (e.g., command-line arguments),
what kinds of machines it can run on (e.g., Sparc5), and how it fits
into the logical organization of the Hub (e.g.,circuit simulation tool).
Each of these tasks is typically accomplished in less than thirty minutes.

To facilitate this functionality, the Hub interprets the URLs differently
from the standard document-oriented web servers. The structure of the URL
is decoupled from that of the underlying filesystem and interpreted in a
context-sensitive manner (based on user-specific state stored by the server),
thus allowing virtual accounting and arbitrary access-control. The
lab-engine provides the Hub with its on-demand high-performance computing
capabilities. When a user requests the execution of a program, the lab-engine
uses information in the user-specified input file to predict (via an
artificial intelligence sub-system - also written in Perl5) the resources
required for the run, selects an appropriate platform (e.g., workstation
for a 2-D problem, supercomputer for a 3-D problem), transfers relevant
input files to the selected machine, and initiates the program (via the
remote server). When the run is completed, the remote server notifies
the lab-engine, which retrieves the output files and informs the user.

The initial prototype, the Semiconductor Simulation Hub, currently
contains thirteen semiconductor technology tools from four universities.
In less than one year, over 250 users have performed more than 13,000
simulations. New Hubs for VLSI design, computer architectures, and parallel
programming have been added in recent months; they currently contain a
modest complement of fourteen tools. These Hubs are currently being used in
several undergraduate and graduate courses at Purdue as well as to
facilitate collaborative research. Regular user include students at Purdue
University and researchers at several locations in the U.S. and Europe.