Text-related
HTML tags comprise the richest set of all in the standard language.
That's because HTML emerged as a way to enrich the structure
and organization of text.

HTML came out of academia.
What was and still is important to those early developers was the
ability of their mostly academic, text-oriented documents to be
scanned and read without sacrificing their ability to distribute
documents over the Internet to a wide diversity of computer display
platforms. (ASCII text is the only universal format on the global
Internet.) Multimedia integration is something of an appendage to
HTML, albeit an important one.

And
page layout is secondary to structure
in HTML. We humans visually scan and decide textual relationships
and structure based on how it looks; machines can only read encoded
markings. Because HTML documents have encoded tags that relate meaning,
they lend themselves very well to computer-automated searches and
recompilation of content--features very important to researchers.
It's not so much how something is
said in HTML as what is being said.

Accordingly,
HTML is not a page-layout language. In fact, given the diversity
of user-customizable browsers as well as the diversity of computer
platforms for retrieval and display of electronic documents, all
HTML strives to accomplish is to advise, not
dictate, how the document might look when rendered by the browser.
You cannot force the browser to display your document in any certain
way. You'll hurt your brain if you insist otherwise.

For
instance, you cannot predict what font and what absolute size--8-
or 40-point Helvetica, Geneva, Subway, or whatever--will
be used for a particular user's text display. Okay, so
the latest browsers now support HTML style sheets and other desktop
publishing-like features that let you control the layout and appearance
of your documents. But users may change their browser's
display characteristics and override your carefully laid plans at
will; the majority of browsers out there don't support
these new layout features, and some browsers are text-only with
no nice fonts at all. What to do? Concentrate on content. Cool pages
are a flash in the pan. Deep content will bring people back for
more and more.

Nonetheless, style does matter for readability,
and it is good to include it where you can, as long as it doesn't
interfere with content presentation. You can attach common style
attributes to your text with physical style
tags like the italic <i> tag in the simple example.
More importantly and truer to the language's original purpose,
HTML has content-based style tags that attach
meaning to various text passages. And you can
alter text display characteristics, such as font style and size,
color, and so on, with Cascading Style Sheets and JavaScript-based
Style Sheets.

All of today's graphical browsers
recognize the physical and content-related text style tags and change
the appearance of their related text passage to visually convey
meaning or structure. You just can't predict exactly what
that change will look like.

Content-based
style tags indicate to the browser that a specific portion of your
HTML text has a specific usage or meaning. The <cite>
tag in our simple example, for instance, means the enclosed text
is some sort of citation--the document's author,
in this case. Browsers commonly, although not universally, display
the citation text in italic, not as regular text. [the section called "Content-Based Style Tags"]

While it may or may not be obvious to the
current reader that the text is a citation, someday, someone might
create a computer program that searches a vast collection of HTML
documents for embedded <cite> tags and compiles
a special list of citations from the enclosed text. Similar software
agents already scour the Internet for HTML-embedded information
to compile listings, such as the infamous Webcrawler and the Lycos
Home Page databases of web sites.

The most common content-based
style used today is that of emphasis, indicated with the <em>
tag. And if you're feeling really emphatic, you might use
the <strong> content style. Other content-based
styles include <code>, for snippets of programming
code; <kbd>, to denote text entered by the user
via a keyboard; <samp>, to mark sample text; <dfn>,
for definitions; and <var>, to delimit variable
names within programming code samples. All of these tags have corresponding
end tags.

Even the barest of barebones
text processors conform to a few traditional text styles, such as
italic and bold characters. While HTML is not a word-processing
tool in the traditional sense, it does provide tags that tell the
browser explicitly to display (if it can) a character, word, or
phrase in a particular physical style.

Although you
should use related content-based tags for the reasons we argue above,
sometimes form is more important than function. So use the <i>
tag to italicize text, without imposing any specific meaning; the
<b> tag to display text in boldface; or the <tt>
tag so that the browser, if it can, displays the text in a teletype-style
monospaced typeface. [the section called "Physical Style Tags"]

It's
easy to fall into the trap of using physical styles when you should
really be using a content-based style instead. Discipline yourself
now to use the content-based styles, because, as we argue above,
they convey meaning as well as style, thereby making your documents
easier to automate and manage.

Not all text characters
available to you for display by a browser can be typed from the
keyboard. And some characters have special meanings in HTML, such
as the brackets around tags, which if not somehow differentiated
when used for plain text--the
less-than sign (<)
in a math equation, for example--will confuse the browser
and trash your document. HTML gives you a way to include any of
the many different characters that comprise the ASCII character
set anywhere in your text through a special encoding of its
character entity.

Like the copyright symbol in our
simple example, a character entity starts with an
ampersand
followed by its name, and terminated with a semicolon. (Alternatively,
you may also use the character's position number in the
ASCII table of characters preceded by the pound or sharp sign (
#) in lieu of its name in the
character entity sequence.) When rendering the document, the browser
displays the proper character, if it exists in the user's
font. [the section called "Character Entities"]

For obvious reasons, the most commonly used character entities are the
greater-than (&gt;), less-than (&lt;), and ampersand
(&amp;) characters. Check Appendix E, Character Entities, to find
what symbol the character entity &#166; represents.

It's not obvious in our simple example,
but the common carriage returns we use to separate paragraphs in
our source document have no meaning in HTML, except in special circumstances.
You could have typed the document onto a single line in your text
editor and it would still appear the same in Figure 2.1.[2]

[2]
We use a computer programming-like style of indentation so
our source HTML documents are more readable. It's not obligatory,
nor are there any formal style guidelines for source HTML document
text formats. We do, however, highly recommend you adopt your own
consistent style, so that you and others can easily follow your
source documents.

You'd soon
discover, too, if you hadn't read it here first, that except
in special cases, browsers typically ignore leading and trailing
spaces, and sometimes more than a few in between. (If you look closely
at the source example, the line "Greetings from"
looks like it should be indented by leading spaces, but it isn't
in Figure 2.1.)

A browser takes the text in the body of your
document and "flows" it onto the computer screen,
disregarding any common carriage-return or line-feed characters
in the source. The browser fills as much of each line of the display
window as possible, beginning flush against the left margin, before
stopping after the rightmost word and moving on to the next line.
Resize the browser window, and the text reflows to fill the new
space; indicating HTML's inherent flexibility.

Of course, readers would
rebel if your text just ran on and on, so HTML does provide both
explicit and implicit ways to control the basic structure of your
document. The most rudimentary and common ways are with the paragraph
(<p>) and the line-break (<br>)
tags. Both break the text flow, which consequently restarts on a
new line. The only apparent difference is that with most browsers,
the paragraph tag adds more vertical space after the line break.
[the section called "The <p> Tag"] [the section called "The <br> Tag"]

By the way, the HTML
standard includes an end tag for the paragraph tag, but not for
the line break tag. Few authors ever include the paragraph end tag
in their documents; the browser usually can figure out where one
paragraph ends and another begins.[3] Give
yourself a star if you knew that </p>
even exists.

[3]
The paragraph
end tag is being used more commonly now that the popular browsers
support the paragraph-alignment attribute.

Besides breaking your text into paragraphs, you
also can organize your documents into sections with headings. Just
as they do on this and other pages in this printed book, HTML headings
not only divide and entitle discrete passages of text: they also
convey meaning visually. With HTML, however, headings also lend
themselves to machine-automated analyses.

There are
six HTML heading tags, <h1> through <h6>,
with corresponding end tags. Typically, the browser displays their
contents in, respectively, very large to very small font sizes,
and sometimes in boldface. The text inside the <h4>
tag is usually the same size as the regular text. [the section called "Heading Tags"]

The heading tags also typically break the current text
flow, standing alone on lines and separated from surrounding text,
even though there aren't any explicit paragraph or line-break
tags before or after a heading.

Besides
headings, HTML also provides horizontal rule lines that help delineate
and separate the sections of your document.

When the
browser encounters an <hr> tag in your document,
it breaks the flow of text and draws a line completely across the
display window on a new line. The flow of text resumes immediately
below the rule. [the section called "The <hr> Tag"]

Occasionally,
you'll want the browser to display a block of text as-is:
for example, with indented lines and vertically aligned letters
or numbers that don't change even though the browser window
might get resized. The HTML <pre> tag rises to
those occasions. All text up to the closing </pre>
end tag appears in the browser window exactly as you type it, including
carriage returns and line feeds, leading, trailing, and intervening
spaces. Although very useful for tables and forms, <pre>
text turns out pretty dull; the popular browsers render the block
in a monospace typeface. [the section called "The <pre> Tag"]