Since I teach a variety of people about HTML, I find it appropriate to keep
a simple reference to HTML handy, as much of the HTML documentation is
either unwieldy or outdated. For example, the HTML 3.0 documentation
runs over 190 pages, and many of the traditional HTML references available
at the time that I first wrote this (ca. 1995) included deprecated
elements, such as <menu>. These days, there are now
good references, but I still find it convenient to have my own.

Unfortunately, the time pressures of academic life have not given me
sufficient opportunity to flesh-out all of this document (e.g., HTML's
relationship to SGML). Nonetheless, both my students and I find it of
some use, particularly in the electronic form.

HTML, the hypertext
markup language is a common
language for building hypertext documents for the World-Wide Web.
Originally, authors had to build their documents in "raw HTML" because
no tools were available. Now that such tools are available, many people
no longer directly write HTML. Nonetheless, there a good reasons to
learn HTML, particularly because it helps you understand what is and
is not possible on the web.

More recently, other markup languages have been developed and extensions
to HTML have been added. The extensions are often platform specific.
The other languages are often standards and provide more facilities.
One key successor to HTML is XML (extensible markup language). HTML has
also been extended with CSS (cascading style sheets) to give page
authors more control over appearance. Neither additional languages nor
extensions are covered in this document.

Note that HTML is a markup language, not a programming
language. What's the difference? A markup language indicates
information about the structure or purpose of pieces of information; a
programming language indicates information about the execution of a
process (more or less).

In HTML, as in most markup languages, a page author marks up
the document, indicating the roles of various parts of the page. One
might indicate that something is the title of the document, the beginning
of a section, an item in a list, and so on and so forth.

In HTML, textual elements are traditionally surrounded by tags, although
there are some tags that act as text elements. A piece of marked-up text
looks something like

<TAG>some text</TAG>

Note that the TAG indicates something about the text. For
example, the TAG might be P for paragraph or
em for emphasized.

In addition, certain tags may have attributes (additional
characteristics). For example, in Netscape's version of HTML, items in
a list may indicate the type of mark that accompanies the item. In such
cases, a piece of marked-up text looks something like

<tagattribute=value>some text</tag>

For example, one might describe a table with a larger-than-normal
border with

Note that there are two basic kinds of markup: logical
markup, in which one describes the roles of pieces of text
(e.g., this is a section heading) and physical markup, in
which one describes the appearance of text (e.g., this is
times, twelve point, bold, centered). Logical markup supports better
information retrieval and permits readers to select appearances they
find most appropriate. HTML provides a mixture of logical and physical
markup tags, with some bias towards logical markup. More recently,
HTML has added style sheets, which provides both a way to
define your own logical elements and a way to assign a
physical appearance to each logical element.

As one might expect, the head is surrounded by <head> and
</head> tags, and the body is surrounded by
<body> and </body> tags. Netscape
extends the body tag with a background attribute. I
feel that this makes documents unreadable, but your mileage may vary.

In addition, the whole document should be surrounded by
<html> and </html> tags.

A basic HTML document might therefore appear as follows:

<html>
<head>
<title>A basic HTML document</title>
</head>
<body>
<p>
This is the only line in the document.
</p>
</body>
</html>

There are a number of different things you can put in the head of the
document. You'll find that many documents include only a title. For
now, all that you really need to understand is the title tag.

<title>..</title> the
title of the document. This is what typically appears in the menu bar,
bookmarks list, and history list.

<link rev="made" href="mailto:...">
a link to the author of the document. There are other
link tags, such as to previous and next document.
This section needs to be expanded. More in HTML 3.0 specs.

<base>..</base> the
base name of the document; used for relative addressing.

<meta ...> a meta tag which describes
characteristics of the document or embeds special http
(hypertext transfer protocol) commands.
For example,
<meta http-equiv="Refresh" content="1; url=nifty.left.html"> tells the browser to switch to another document after 1 second.
Some meta tags can also be included in the body of the document.

HTML uses a hierarchical heading system, with labels from
<h1> to <h6> Much of the documentation
suggests that you only use them in order. For example, you should always
begin your documents with an h1 tag and you should never use
an h3 tag without a surrounding h2 tag.

So far, the tags we've seen have described parts of almost any text,
whether on the Web or in printed form. Now let us consider the things
that make hypertext hypertext: the links from page to page, and the
anchors that let us link to particular parts of a page.

<a href="URL">..</a> a link to another
document.

<a name="text">..</a> a named section
of the present document.

<a href="#text">..</a> a link to a named
section in the current document.

<a href="URL#text">..</a> a link to a named
section in another document.

While HTML tables were originally developed to support better presentation
of tabular data (that is, data that you'd like to organize into columns
and rows), tables are often used to provide more precise layout. When
you write HTML for me, you should only use tables for presenting tabular
data and not for layout.

You know how to make an HTML document. So, how do you get it on the
Web? It all depends on your system and your Web server. I'm a Unix
(and Linux) person, so my answers will be biased towards typically
Unix installations.

First, you must create a Web directory. Typically, this directory
is called public_html and must be in your home directory.
The directory must be appropriately accessible. On Unix systems, this
means that the directory must be readable and executable.

Next, you put the file in that directory. If you're working on a
machine that shares a filesystem with the Web server, you can edit
the file in that directory. If you're working on another machine,
you'll need to transfer the file (typically with an ftp program).
Make sure the file is readable.

As you might guess, HTML has a formal specification. That means that
there are HTML documents that are grammatically correct, and documents
that are grammatically incorrect. (Content can also be correct or
incorrect, but that's outside of the scope of this document.) While most
WWW browsers are fairly nice, and will display grammatically incorrect
documents, you should make it a point to write correct documents.

The World Wide Web Consortium (w3c) provides an HTML Validator, which
is available at http://validator.w3.org. You should use it
to check your pages.

As the Web has grown, it has become increasingly important for Web
authors to support a wider variety of users. A modern Web page should
support the variety of different devices people use to browse the Web
(large-screen displays, cell phones, auditory browsers, text-only
browsers, and much much more); the different skill sets people bring
to the Web (particularly with regard to language), and even different
physical abilities (e.g., those with limited sight, hearing, or mobility).

The w3c has created a set of guidelines for making your content accessible
to a wide variety of users (particularly those with limited sight, hearing,
or mobility). These guidelines are available at
http://www.w3.org/WAI/. It is also possible to have a
program check many basic usability issues. The most popular checker
is Bobby, which is available at
http://bobby.watchfire.com.

Disclaimer:
I usually create these pages on the fly, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.

This document was generated by
Siteweaver on Mon Dec 2 08:41:33 2002.
The source to the document was last modified on Mon Sep 2 08:26:02 2002.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS151/2002F/Readings/html-quick.html.