World Wide Web 10 Years Later (Part 1)

Yonina Cooper and Hal Berghel
Department of Computer Science
University of Nevada at Las Vegas

History/Background

The World Wide Web (or, popularly, "the Web") [5][6][7] was conceived
by Tim Berners-Lee and his colleagues at CERN (the European Laboratory for Particle
Physics) in 1989 as a shared information space which would support collaborative
work. At that time, Berners-Lee defined the communications protocol-pair, Hypertext
Transfer Protocol (HTTP) and Hypertext Markup Language (HTML), which forms the
backbone of the Web. Berners-Lee ushered in the World Wide Web in 1990 with
the first Web client navigator-browser, developed as a proof of concept, called
the WorldWideWeb which "was the only way to see the web" [8] at that
time. (Screen shots of that browser-editor can be seen at http://www.w3.org/People/Berners-Lee/WorldWideWeb.html.)
Nicola Pellow wrote the first cross-platform browser which was released in 1991.
By 1992, interest in the Web had grown sufficiently to produce four additional
browsers - Erwise, Midas, and Viola for X Windows, and Samba for Macintosh.

Early in 1993, Marc Andreesen of the National Center for Supercomputer Application
(NCSA) released the first version of Mosaic for the X Windows System which soon
became the browser standard against which all others would be compared. 1993
also saw the release of a number of other browsers including Cello for Windows
developed at the Cornell Law School and Lynx 2.0 developed at the University
of Kansas . Lynx quickly became the preferred browser for non-graphics mode
(or character mode) terminals, while the other Web clients shared the workstation
client-side market. In 1994, Andreesen left NSCA to co-found Netscape whose
browser was released late in 1994 and then proceeded to dominate the browser
market. When Microsoft released their Windows 95 operating system in August,
1995, it included the web browser, Internet Explorer, which had conquered a
third of the market by the fall of 1996. Today, Internet Explorer is the leading
web browser having passed Netscape in 1999.

Despite its original design goal of supporting collaborative work, the Web
has diverged into many highly variegated uses all evolving from the two protocols:
HTML and HTTP. In this paper we examine the current state of HTML and related
technologies. HTTP will be examined in a later paper.

Hypertext Markup Language (HTML)

Pioneering and independent visions led to the hypertext orientation of HTML.
In 1945, Vannevar Bush [13] described a device which could create and follow
links between documents on microfiche. In the 1960's, Douglas C.Englebart [16][17]
prototyped an online system that, among other things, enabled a browsing environment
enhanced with a new rapid cursor movement innovation called a "mouse"
[18]. In 1965, Ted Nelson [23][24] coined the term 'hypertext' in a presentation
to the 20th National Conference of the Association for Computing Machinery.

From a technical perspective, HTML is a sequence of extensions to the original
concept of Berners-Lee which was text-oriented. The international standard underlying
HTML, the Standard Generalized Markup Language (SGML) [20], is based on the
Generalized Markup Language (GML), developed at IBM in 1969 by the research
team of Charles Goldfarb, Edward Mosher and Raymond Lorie. The markup described
the structure of a document, not its appearance. The document structure is written
in a Document Type Definition (DTD) that specifies a set of document elements
and their relationships together with a set of tags with which to mark up the
document. The SGML standard was adopted in the mid-1980's.

HTML started out as an SGML DTD. The tags were adapted for the distributed,
hyperlinked environment described above. The first version of HTML of the early
1990's provided only basic structure with rudimentary graphics and hypertext.
But by 1993, HTML standards were a moving target. There were two organizations
overseeing Web and Internet standards, including HTML: the World Wide Web Consortium
(W3C) and the Internet Engineering Task Force (IETF). (Berners-Lee currently
serves as the director of the W3C (www.w3.org) which he founded in 1994.) However,
Netscape had gone on its way to offering new features which were not endorsed
by the W3C/IEFT, including some that were actually inconsistent with the purist
SGML orientation intended by the designers of HTML. Under pressure to gain market
share, this trend continued as navigator/browser developers attempted to add
as many useful "extensions" to the HTML standard as could be practicably
supported. This competition, initially called the "Mosaic Wars." [5]
still remains, although with diminished impact, today.

HTML Version 2.0, proposed by the IETF (www.ietf.org) HTML Working Group, was
a specification which roughly corresponded "to the capabilities of HTML
in common use prior to 1994."[9] Basically, this version added forms and
lists. The IETF HTML Working Group closed in 1996. HTML+ and HTML 3.0 were HTML
versions which were never standardized. HTML 2.0 was replaced by HTML 3.2 in
January, 1997. Standards for HTML are now released as W3C Recommendations. HTML
3.2 aimed to "capture recommended practice as of early 1996"[25] and
included tables, applets, scripts, advanced Common Gateway Interface (CGI) programming,
security, and text flow around graphics. HTML 4.01, a subversion of HTML 4.0,
was released in 1999 and "supports a wider range of multimedia options,
scripting languages, style sheets, better printing facilities, and documents
that are more accessible to users with disabilities."[26] Frames including
inline frames, client-side image maps, advanced forms/tables, TTY and Braille
support, compound documents with a hierarchy of alternate rendering strategies
and internationalization are added features in HTML 4.01 as well as document
formatting being achieved via cascading style sheets (CSS). And scripting capabilities
have been added to most of the HTML elements. Specification of the document
type such as

is now required. This allows validation of the document using the W3C's validator.w3.org
which checks HTML documents for conformity to W3C Recommendations. Similarly,
CSS documents can be validated via the W3C CSS Validation Service (http://jigsaw.w3.org/css-validator/).
CSS2 [11] became a Recommendation, May 12, 1998, and CSS3 [22] is currently
in the works. A quick Web search will identify numerous validators and checkers,
whose services are available freely or commercially (e.g., http://www.w3.org/MarkUp/html-test/
from the World Wide Web Consortium).

Many innovations for HTML were invented by the browser vendors, particularly
Internet Explorer and Netscape, as they vied for the market share and many of
these became standards in subsequent HTML standards. HTML and the independent
browser implementations, in many ways, evolved away from its nicely thought
out roots. The original designers were careful to not confuse form with content.
However HTML became a patchwork of ideas as it quickly evolved and as a result
muddied the difference between form and content. The new kid on the block, called
XML (discussed below), is set to reunite HTML with its SGML root. Yet Web users
and Web developers will still be faced with browsers which do not or only partially
implement the HTML standards. The introduction of the validators for HTML documents
encourages the adherence to the standards by Web developers. As a complement
to online validators, the World Wide Web Test Pattern (www.uark.edu/~wrg/) developed
at the University of Arkansas in the mid 1990's provided a test bench for determining
the level of HTML compliance of arbitrary browsers. The "WWW Viewer Test
Page" developed at Lawrence Livermore Laboratory (www-eng.llnl.gov/documents/WWWtest.html)
allows the testing of a variety of media formats.

A step toward reuniting HTML with its SGML root is the conversion to the Extensible
Hypertext Markup Language (XHTML).

Extensible Hypertext Markup Language (XHTML)

XHTML is a reformulation of HTML 4.0 in XML, thus combining the strength of
HTML 4 with the power of XML. XHTML is to replace HTML as the primary venue
for describing Web content. The features of XHTML are richer, more robust and
extensible than those of HTML. W3C seeks to create standards for providing these
features on the ever increasing range of browser platforms, e.g. cell phones,
televisions, cars, wallet-size wireless communicators, kiosks and desktops.
Dave Raggett has created an open source utility, HTML Tidy [29], to assist Web
developers in converting HTML documents to XHTML and in general "tidying"
up sloppy HTML code and thereby rendering maintenance easier. Currently the
tool is only available for UNIX platforms. As an aside, there are mountains
of incorrect HTML code which is currently rendered by forgiving Web browsers.
Such problems disappear with XHTML.

XHTML is a family of document types which are XML based and ultimately designed
to be used in conjunction with XML based user agents. XHTML 1.0 [27] has three
document types: XHTML 1.0 STRICT, XHTML 1.0 TRANSITIONAL, and XHTML 1.0 FRAMESET
reformulated from HTML 4. Each has its own DTD that sets out the rules and regulations
for using HTML in a definitive, succinct manner. XHTML 1.1[2] was released in
May, 2001; however support is not ubiquitous. XHTML 1.1 continues the evolution
of separating presentation from content and in this sense is more restrictive
than XHTML 1.0. For instance, XHTML 1.0 TRANSITIONAL or FRAMESET document types
contained a number of presentational elements which are now relegated to being
handled via style sheets or other mechanisms. For example, frames are in this
category. Hence XHTML 1.1 can somewhat be considered XHTML 1.0 STRICT except
for the removal of certain features as a result of the underlying strategy of
providing markup which is rich in structural functionality but leaves presentation
to style sheets. To understand this philosophy/strategy requires a study of
the Extensible Markup Language (XML).

Extensible Markup Language (XML)

As previously noted, the Web was originally a publishing avenue for scientific
documents. Today, it is a full-fledged medium, not only on equal footing with
print and television but more importantly, an interactive medium. To accommodate
the phenomenal popularity and growth of the Web, HTML was repeatedly extended,
introducing new tags. The first version of HTML had only a dozen or so tags
while the last version (HTML 4) has nearly a hundred "official" tags,
not counting the non-standard, typography and format extensions.

During the early years of the Web, HTML technology grew rapidly spurred by
independent extensions created by the market leading browsers. The standards
lagged behind the implementation of features, creating inconsistent rendering.
Furthermore, a rich set of supporting technologies have been introduced: JavaScript,
Java, Flash, CGI, ASP, JSP, servelets, ESB, streaming media, and MP3 to name
a few. Although W3C developed some of these technologies, many were developed
or introduced by vendors, e.g. Sun, Netscape and Microsoft. Now it seems that
the tables have somewhat turned. New developments along with new standards are
occurring almost daily and the browsers and/or user-agents are lagging behind.
This phenomenon is understandable given the pace at which technologies are being
introduced and standardized. In fact the fast pace of innovation in Web technology
is truly unprecedented. "The combination of hypertext and a global Internet
started a revolution. A new ingredient, XML, is poised to finish the job."
[10].

HTML is currently supported by literally thousands of applications: browsers/navigators,
editors, e-mail software, spreadsheets, databases, contact managers, word processors,
and more. Even with the existing rich set of tags, there is a need for more
flexibility as specialized software seeks to utilize the basic Web infrastructure.
While on the one hand there is the demand for more tags, there is a conflicting
need to simplify in order to make Web use accessible to a broader range of computing
devices (e.g., PDAs, Japanese I-mode phones, European WAP phones, convergent
technologies, etc.) that may access pages with more markup than content.
XML development began in late 1996, and was completed in early 1998. Almost
immediately, it was extended to applications domains in mathematics, science
and medicine. New applications seem to pop up almost daily. (See http://www.oasis-open.org/cover/sgmlnew.html
for up to date information on XML products and releases.) XML treads new territory
only where it is appropriate. XML will not replace HTML in the near term but
HTML will converge toward XML through the XHTML standard. Yet the International
Standards Organization (ISO) has standardized HTML (ISO/IEC 15445) on the conviction
that HTML will persist for another 25 years. As such, ISO expects W3C to remain
responsible for HTML.

The philosophy behind XML was to answer the conflicting demands being made
on HTML. The resolution was particularly simple and essentially two significant
changes were made to HTML: XML has no predefined tags and XML is strict. The
first is the eXtensible part; the author creates the tags needed for his/her
application. Secondly, HTML was most forgiving in the area of syntax - great
for lazy authors but taxing on browsers. According to some estimates, as much
or more than 50% of the code in a browser handles errors or sloppiness on the
part of the author. As a result browsers are growing in size and becoming slower
which does not bode well for the owners of the handheld devices. Thus the decision
for a strict syntax will facilitate the development of smaller, faster, lighter
browsers.

XML does not describe how to render the data; it merely indicates the structure
and content of the data which, it should be remembered, was the original objective
of HTML qua SGML offspring. Thus XML is ideal for document publishing since
it is independent of format and delivery medium. As the figure below suggests,
documents that are created and maintained in XML may easily be transformed into
formats that are optimal for the manner or method of dissemination, be that
"smart" telephony or fax, the Web or printing. While this is appealing,
the operative is "automatically," but that is being addressed. If
the tool or platform is not available today, it will be tomorrow. See W3C's
website (www.w3.org) for a list of XML processors.

Traditionally the web page was a static HTML document offering minimal interactivity
and relying heavily on an overloaded server and CGI scripts. XML is poised to
offer web applications as opposed to just web pages. The ability of web sites
to do so much more than deliver text, graphics, even multimedia (and without
requiring massive amounts of Internet traffic) is the momentum behind the ascendancy
of XML. In a keynote address Adam Bosworth (then with Microsoft) demonstrated
two such web applications: an art auction which enabled a user to view and bid
on pieces of art as well as watch the bidding process with minimal round-trips
to the server; and a frequent flyer awards program allowing a user to review
their frequent flyer miles, determine their available awards as well as plan
future flights in the context of building frequent flyer awards for the future.
[33]

Numerous tools are available for parsing and verifying XML documents. XML processors
are typically implemented as Java applications but regardless of the implementation
means, most still do not conform completely to the standard. The median seems
to be about 80% conformance. And because of varying and inconsistent support
for XML documents by the current popular browsers, many find it more convenient
to ignore the browser and apply style sheets on the Web servers to generate
HTML. The usual solution is to use the Extensible Stylesheet Language (XSL)[1]
to produce HTML which is then rendered by a current browser, or even a former
generation browser . A subset of XSL, XSL Transformations (XSLT) [15], is the
standard for transforming data from one XML document to another XML document,
usually XHTML. Numerous tools exist for converting many document types using
XSL. See http://www.w3.org/Style/XSL. In the most general form, an XML document
with its XSLT style sheet are processed via an XSLT processor producing XSL-Formatting
Objects(XSL-FO) and an XSL-FO processor is then used. See http://dmoz.org/Computers/Data_Formats/Markup_Languages/XML/Style_Sheets/Implementations/
for a listing of available processors.

Lastly, XML is a low-level syntax for representing data intended for supporting
a wide variety of applications. Implemented applications are the Mathematical
Markup Language (MathML) [14], the Chemical Markup Language (CML) [32], the
Synchronized Multimedia Integration Language (SMIL) [3], the Scalable Vector
Graphics (SVG) [19] format, the Resource Description Framework (RDF) [31] for
describing meta-data, etc. RDF is touted to be the venue for the Semantic Web
[10] in much the way HTML served the original Web. Implemented RDF applications
include the Platform for Internet Content Selection (PICS) [30], the Platform
for Privacy Preferences (P3P) [21], among others.

Dynamic HTML(DHTML)

DHTML is an all-in-one term for web pages using HTML, CSS and rely on a scripting
language to render the pages interactively. JavaScript, by far the most popular
scripting language, is a scripting language built into the web browser that
controls HTML elements. Whereas Java is a high-level programming language for
building cross-platform applications, e.g. applets which can be embedded in
a Web page, DHTML is entirely a client-side technology, relying on the browser
to dynamically change the rendering and content of a document. This gives the
Web author the ability to create Web documents which interact with the user
without depending upon server-side programs or complicated sets of HTML pages
to achieve special effects. DHTML excels in creating low-bandwidth effects that
enhance a web page's functionality, such as creating animations, games, applications,
new ways of navigating through web sites, as well as out-of-this-world page
layouts which are impossible with just HTML. Although many of these features
can be duplicated with either Flash or Java, DHTML does not require plugins
or additional support from applications or embedded controls to make changes
but embeds seamlessly into the web page. Even though the underlying technologies
of DHTML (HTML, CSS, JavaScript) are all standardized, the Microsoft and Netscape
implementations differ radically and are largely incompatible. Cross-browser
DHTML is feasible but requires effort from the Web author.

The latest addition to DHTML is the Document Object Module (DOM) [28] that
introduces a new concept for event detection and the subsequent calling of event
handlers. The later versions of both Microsoft and Netscape browsers both support
DOM but the support differs dramatically.

CONCLUSION

The past decade has seen great strides in both the development of new Web technologies,
particularly in the area of hypertext markup. Although many technologies have
been added, more are being added, almost daily it seems, to advance the Web
to the point of being fully interactive, participatory and immersive. Advances
during the last decade in the programming technologies used for the Web will
be examined in Part 2 of this series.

[5] Berghel, H.: "The Client Side of the Web," Communications of
the ACM, 39:1 (1996), pp. 30-40. See also, revised version of the same name
in Kent, A.: "Encyclopedia of Library and Information Science," Marcel
Dekker, 64:27, pp 39-51. (1999)