reptile7's JavaScript blog is Andrew Peak's personal technical writing project: it focuses on JavaScript and the analysis of JavaScript scripts, although HTML, CSS, and anything else related to coding for the Web are also fair game.

In today's post we'll take up HTML Goodies' "JavaScript and HTML Tricks" tutorial. Authored by Joseph Myers and originally appearing at WebReference.com, "JavaScript and HTML Tricks" briefly runs through a grab bag of topics relating to lists or forms in some way. Here's what we've got on tap:
(1) The tutorial's first page (a) addresses the storage and extraction of list data in invisible HTML elements and then (b) shows how to randomize that data.
(2) The tutorial's second page discusses (a) the HTML fieldset element, (b) labels as a tool for checking radio buttons and checkboxes, and (c) the use of images for list item markers.
Some of this stuff we've covered previously but it won't kill us to revisit it.

Hide and extract

The tutorial's Introduction touts the use of hidden HTML elements as a great way to store various types of data. The subsequent Getting Data From Text Lines in Hidden HTML Elements section fleshes out this concept with an <input type="hidden"> control whose value holds a set of domain names:

When I first saw this, I thought, "If you don't want the user to see your raw data, then you shouldn't clutter the document body with it - you should just code it as an array from the get-go," but it subsequently occurred to me that an input element is at least a semantically appropriate container for data that will later be fed to a script of some sort. Of course, this is not to say you couldn't use some other element here - for example, you could store the domain name list in an identical format as the content of a span element - and then "hide" (zero out) that element with a display:none; style declaration so as to duplicate a hidden control's rendering (on my computer, X<input type="hidden" value="someValue">Y displays as XY - there isn't even a space between the X and the Y).

As shown above, a multiline hidden control value allows the enumeration of list data, and the annotation of that data with relevant comments, in a highly readable fashion. But as the previous paragraph implies, data is meant to be acted on: how might we extract the domain name list from the value in a useful way? Towards this end, the author applies to the value a series of functions that converts the value to a sorted array of domain names. More specifically:
(1) A getLinesFromHidden( ) function converts the value to an unsorted array of data lines lacking end-of-line characters.
(2) A trimWhiteSpace( ) function removes the two space characters that begin each line.
(3) A removeEmpty( ) function would remove empty data lines if any were present (we'll modify this function below so that it removes // comments as well).
(4) Finally, the array is sorted lexicographically by the sort( ) method of the core JavaScript Array object.
A quick summary of this code is given at the beginning of the Getting Data ... section but a detailed deconstruction therefor is not provided, so perhaps we should do that.

The extraction action kicks off with a call to the getLinesFromHidden( ) function:

domains, the id value of the hidden control, is passed to getLinesFromHidden( ) and given an a identifier. getLinesFromHidden( )'s first statement gets the hidden control and gives it an e object reference:

var e = document.getElementById(a);
/* This means of access (vis-à-vis a document.forms[formIndex].elements[0] reference) obviates the need to wrap the hidden control in the form element, which can be thrown out. */

The preceding command is followed by a conditional that would return an empty array if e were null, i.e., if the getElementById( ) call doesn't find the element it's looking for:

if (!e) return [ ];

This sort of code strikes me as pointless: "It's the author's responsibility to see to it that somewhere in the source there's an element having the id in question" is my attitude. Anyway, let's move on to the next line, which gets the hidden control value and gives it an s identifier:

var s = e.value;

We're almost ready to split the hidden control value into an array of data lines. To facilitate that split, the value lines' end-of-line characters, which will vary depending on the user's operating system, are harmonized by the following statement:

After giving domainList an a identifier, trimWhiteSpace( ) loops through the a data lines and replace( )s leading and trailing white space with empty strings. /^\s+|\s+$/ is a regular expression literal whose pattern matches either
(x) one or more white space characters immediately following the ^ start-of-string anchor, or
(y) one or more white space characters immediately preceding the $ end-of-string anchor.
The literal's g flag effectively converts the pattern's | boolean OR operator to a boolean AND, i.e., it enables us to match (x) and (y).

In our example, the a data lines don't have any trailing white space although they individually begin with two space characters, which are removed by the replace( ) operation. The resulting lines are loaded into a new a array, which is returned to the domainList = removeEmpty(trimWhiteSpace(domainList)); line, which next calls, and passes the a array to, the removeEmpty( ) function:

removeEmpty( ) loops through a's data lines and loads the non-empty ones into a new b array. b is initially declared as an empty array literal; each assignment to b[b.length] effectively increments the b.length array index by 1. Empty a[i]s are left behind as they convert to false as an if condition. The b array is returned to the domainList = removeEmpty(trimWhiteSpace(domainList)); line and renamed domainList.

Removing comments

Getting rid of single-line // comments is easy; at the removeEmpty( ) stage, recasting the loop conditional as

will keep //-beginning lines out of the b array. As you would expect, a / in a regular expression must be literalized with a backslash when using the literal syntax.

Getting rid of multiline /* */ comments is trickier, particularly if such comments extend over more than two lines; nonetheless, this can be done at the getLinesFromHidden( ) stage via a

s = s.replace(/\/\*[\S\s]*?\*\//g, "");

statement placed just before or after the s = s.replace(/\r\n|\r/g, "\n"); line.

Regarding the \/\*[\S\s]*?\*\/ regular expression:
• The * characters of the comment delimiters must of course be literalized with backslashes.
• [\S\s] matches any character, including \n, which is not matched by the 'dot'.
• If the s in question has more than one /* */ comment, then it is necessary to make the * quantifier of [\S\s]* "non-greedy" by following it with a ? - otherwise, everything between those comments will be removed as well. The ?-modulation of regular expression quantifiers is briefly discussed at the Mozilla JavaScript Guide's "Regular Expressions" page to which I've been linking but I also encourage you to check out Regular-Expressions.info's "Repetition" page, which treats this subject in helpful detail.

// comments can also be removed at the getLinesFromHidden( ) stage via a

s = s.replace(/\/\/.*/g, "");

statement placed just before the return s.split("\n"); line.

Sort and display

The domainList = removeEmpty(trimWhiteSpace(domainList)); line is followed by a

domainList.sort( );

command that sorts the domainList array in lexicographic order, i.e., à la a dictionary (e.g., allme.us follows the domains that begin with a digit, o-1.us precedes os-i.org). The sorted domainList is finally displayed via a showList( ) function that
(1) joins the domainList values into one big string and separates those values with br elements via the join( ) method of the core JavaScript Array object,
(2) wraps the join( )ed string in a blockquote element, and
(3) document.write( )s the blockquote element to the page.

The printed result appears at the WebReference version of the tutorial but not at the HTML Goodies version. Inspection of the latter's document source does reveal the presence of code for a demo just before the "Later, I'll write an article ..." paragraph; exasperatingly, this code is defective because its brace and square-bracket script characters are specified in the form of numeric character references: { is coded as &#123;, } is coded as &#125;, etc. (This isn't the first time we've seen a script killed in this way at HTML Goodies.) Moreover and less importantly, the <h3>Result</h3> heading and the