Description

Documentation

This egg provides some utilities from the sxml-tools available in the SSAX/SXML Sourceforge project. It consists of the extensions defined in sxml-tools.scm plus sxpathlib and sxpath-ext. This is equivalent to the "low-level sxpath interface" described at the introduction to SXPath.

These utilities are useful when you want to query SXML document trees, but full sxpath would be overkill. Most of these procedures are faster than their sxpath equivalent, because they are very specific. But this also means they are very low-level, so you should use them only if you know what you're doing.

The initial documentation on this wiki page came straight from the comments in the extremely well-documented source code. It's recommended you read the code if you want to learn more.

sxml-tools

This section documents the procedures that come from sxml-tools. These include mostly-generic list and SXML operators.

Predicates

[procedure](sxml:empty-element? obj)

Predicate which returns #t if given element obj is empty. Empty elements have no nested elements, text nodes, PIs, Comments or entities but may contain attributes or namespace-id. It is a SXML counterpart of XML empty-element.

[procedure](sxml:shallow-normalized? obj)

Returns #t if the given obj is a shallow-normalized SXML element. The element itself has to be normalised but its nested elements are not tested.

[procedure](sxml:normalized? obj)

Returns #t if the given obj is a normalized SXML element. The element itself and all its nested elements have to be normalised.

[procedure](sxml:shallow-minimized? obj)

Returns #t if the given obj is a shallow-minimized SXML element. The element itself has to be minimised but its nested elements are not tested.

[procedure](sxml:minimized? obj)

Returns #t if the given obj is a minimized SXML element. The element itself and all its nested elements have to be minimised.

Accessors

These procedures obtain information about nodes, or their direct children. They don't traverse subtrees.

Normalization-independent accessors

These accessors can be used on arbitrary, non-normalized SXML trees. Because of this, they are generally slower than the normalization-dependent variants listed in the next section.

[procedure](sxml:name node)

Returns a name of a given SXML node. It is introduced for the sake of encapsulation.

[procedure](sxml:element-name obj)

A checked version of sxml:name, which returns #f if the given obj is not a SXML element. Otherwise returns its name.

[procedure](sxml:node-name obj)

Safe version of sxml:name, which returns #f if the given obj is not a SXML node. Otherwise returns its name.

The difference between this and sxml::element-name is that a node can be one of @, @@, *PI*, *COMMENT* or *ENTITY* while an element must be a real element (any symbol not in that set is considered to be an element).

[procedure](sxml:ncname node)

Like sxml:name, except returns only the local part of the name (called an "NCName" in the [http://www.w3.org/TR/xml-names/|XML namespaces spec]).

The node's name is interpreted as a "Qualified Name", a colon-separated name of which the last one is considered to be the local part. If the name contains no colons, the name itself is returned.

Important: Please note that while an SXML name is a symbol, this function returns a string.

[procedure](sxml:name->ns-id sxml-name)

Given a node name, return the namespace part of the name (called a namespace-id). If the name contains no colons, returns #f. See sxml:ncname for more info.

Important: Please note that while an SXML name is a symbol, this function returns a string.

[procedure](sxml:content obj)

Retrieve the contents of an SXML element or nodeset. Any non-element nodes (attributes, processing instructions, etc) are discarded, while the elements and text nodes are returned as a list of strings and nested elements in document order. This list is empty if obj is an empty element or empty list.

The inner elements are unmodified so they still contain attributes, but also comments or other non-element nodes.

Returns a string which combines all the character data from text node children of the given SXML element or "" if there are no text node children. Note that it does not include text from descendant nodes, only direct children.

Normalization-dependent accessors

"Universal" accessors are less effective but may be used for non-normalized SXML. These safe accessors are named with suffix '-u' for "universal".

"Fast" accessors are optimized for normalized SXML data. They are not applicable to arbitrary non-normalized SXML data. Their names have no specific suffixes.

[procedure](sxml:content-raw obj)

Returns all the content of normalized SXML element except attr-list and aux-list. Thus it includes PI, COMMENT and ENTITY nodes as well as TEXT and ELEMENT nodes returned by sxml:content. Returns a list of nodes in document order or empty list if obj is an empty element or an empty list.

This function is faster than sxml:content.

[procedure](sxml:attr-list-u obj)

Returns the list of attributes for given element or nodeset. Analog of ((sxpath '(@ *)) obj). Empty list is returned if there is no list of attributes.

[procedure](sxml:aux-list obj)[procedure](sxml:aux-list-u obj)

Returns the list of auxiliary nodes for given element or nodeset. Analog of ((sxpath '(@@ *)) obj). Empty list is returned if a list of auxiliary nodes is absent.

[procedure](sxml:aux-node obj aux-name)

Return the first aux-node with <aux-name> given in SXML element obj or #f is such a node is absent.

NOTE: it returns just the first node found even if multiple nodes are present, so it's mostly intended for nodes with unique names. Use sxml:aux-nodes if you want all of them.

[procedure](sxml:aux-nodes obj aux-name)

Return a list of aux-nodes with aux-name given in SXML element obj or '() if such a node is absent.

[procedure](sxml:attr obj attr-name)

Returns the value of the attribute with name attr-name in the given SXML element obj, or #f if no such attribute exists.

[procedure](sxml:attr-from-list attr-list name)

Returns the value of the attribute with name attr-name in the given list of attributes attr-list, or #f if no such attribute exists. The list of attributes can be obtained from an element using the sxml:attr-list procedure.

[procedure](sxml:num-attr obj attr-name)

Returns the value of the numerical attribute with name attr-name in the given SXML element obj, or #f if no such attribute exists. This value is converted from a string to a number.

[procedure](sxml:attr-u obj attr-name)

Accessor for an attribute attr-name of given SXML element obj, which may also be an attributes-list or a nodeset (usually content of an SXML element)

[procedure](sxml:ns-list obj)

Returns the list of namespaces for given element. Analog of ((sxpath '(@@ *NAMESPACES* *)) obj). The empty list is returned if there are no namespaces.

[procedure](sxml:ns-id->nodes obj namespace-id)

Returns a list of namespace information lists that match the given namespace-id in SXML element obj. Analog of ((sxpath '(@@ *NAMESPACES* namespace-id)) obj). The empty list is returned if there is no namespace with the given namespace-id.

Returns a minimized and normalized SXML element obj with empty lists of attributes and aux-lists eliminated, in obj and all its descendants.

[procedure](sxml:clean obj)

Returns a minimized and normalized SXML element obj with empty lists of attributes and all aux-lists eliminated, in obj and all its descendants.

Sxpath-related procedures

[procedure](select-first-kid test-pred?)

Given a node, return the first child that satisfies the test-pred?. Given a nodeset, traverse the set until a node is found whose first child matches the predicate. Returns #f if there is no such a child to be found.

[procedure](sxml:node-parent rootnode)

Returns a function of one argument - an SXML element - which returns its parent node using *PARENT* pointer in the aux-list. '*TOP-PTR* may be used as a pointer to root node. It returns an empty list when applied to the root node.

[procedure](sxml:add-parents obj [top-ptr])

Returns the SXML element obj annotated with *PARENT* pointers for obj and all its descendants. If obj is not the root node (a node with a name of *TOP*), you must pass in the parent pointer for obj as top-ptr.

Warning: This procedure mutates its obj argument.

[procedure](sxml:lookup id index)

Lookup an element using its ID. index should be an alist of (id . element).

Markup generation

XML

[procedure](sxml:attr->xml attr)

Returns a list containing tokens that when joined together form the attribute's XML output.

Warning: This procedure assumes that the attribute's values have already been escaped (ie, sxml:string->xml has been called on the strings inside it).

Examples:

(sxml:attr->xml '(href "http://example.com"))

=>

(" ""href""='""http://example.com""'")

[procedure](sxml:string->xml string)

Escape the string so it can be used anywhere in XML output. This converts the <, >, ', " and & characters to their respective entities.

[procedure](sxml:sxml->xml tree)

Convert the tree of SXML nodes to a nested list of XML fragments. These fragments can be output by flattening the list and concatenating the strings inside it.

HTML

[procedure](sxml:attr->html attr)

Returns a list containing tokens that when joined together form the attribute's HTML output. The difference with the XML variant is that this encodes empty attribute values to attributes with no value (think selected in option elements, or checked in checkboxes).

Warning: This procedure assumes that the attribute's values have already been escaped (ie, sxml:string->html has been called on the strings inside it).

[procedure](sxml:string->html string)

Escape the string so it can be used anywhere in XML output. This converts the <, >, " and & characters to their respective entities.

[procedure](sxml:non-terminated-html-tag? tag)

Is the named tag one that is "self-closing" (ie, does not need to be terminated) in HTML 4.0?

[procedure](sxml:sxml->html tree)

Convert the tree of SXML nodes to a nested list of HTML fragments. These fragments can be output by flattening the list and concatenating the strings inside it.

Procedures from sxpathlib

Basic converters and applicators

A converter is a function

type Converter = Node|Nodelist -> Nodelist

A converter can also play a role of a predicate: in that case, if a converter, applied to a node or a nodelist, yields a non-empty nodelist, the converter-predicate is deemed satisfied. Throughout this file a nil nodelist is equivalent to #f in denoting a failure.

[procedure](nodeset? obj)

Returns #t if obj is a nodelist.

[procedure](as-nodeset obj)

If obj is a nodelist - returns it as is, otherwise wrap it in a list.

Node test

The following functions implement 'Node test's as defined in Sec. 2.3 of the XPath document. A node test is one of the components of a location step. It is also a converter-predicate in SXPath.

[procedure](sxml:element? obj)

Predicate which returns #t if obj is SXML element, otherwise #f.

[procedure](ntype-names?? crit)

Takes a list of acceptable node names as a criterion and returns a function, which, when applied to a node, will return #t if the node name is present in criterion list and #f otherwise.

ntype-names?? :: ListOfNames -> Node -> Boolean

[procedure](ntype?? crit)

Takes a type criterion and returns a function, which, when applied to a node, will tell if the node satisfies the test.

ntype?? :: Crit -> Node -> Boolean

The criterion crit is one of the following symbols:

@

tests if the Node is an attributes-list

*

tests if the Node is an Element

*text*

tests if the Node is a text node

*data*

tests if the Node is a data node (text, number, boolean, etc., but not pair)

*PI*

tests if the Node is a processing instructions node

*COMMENT*

tests if the Node is a comment node

*ENTITY*

tests if the Node is an entity node

*any*

#t for any type of Node

other symbol

tests if the Node has the right name given by the symbol

Examples:

((ntype?? 'div) '(div (@ (class "greeting"))"hi"))

=>

#t

((ntype?? 'div) '(span (@ (class "greeting"))"hi"))

=>

#f

((ntype?? '*) '(span (@ (class "greeting"))"hi"))

=>

#t

[procedure](ntype-namespace-id?? ns-id)

This function takes a namespace-id, and returns a predicate Node -> Boolean, which is #t for nodes with the given namespace id. ns-id is a string. (ntype-namespace-id?? #f) will be #t for nodes with non-qualified names.

[procedure](sxml:complement pred)

This function takes a predicate and returns it complemented, that is if the given predicate yields #f or '() the complemented one yields the given node and vice versa.

[procedure](node-eq? other)

Returns a predicate procedure that, given a node, returns #t if the node is the exact same as other.

[procedure](node-equal? other)

Returns a predicate procedure that, given a node, returns #t if the node has the same contents as other.

[procedure](node-pos n)

Returns a procedure that, given a nodelist, returns a new nodelist containing only the nth element, counting from 1. If n is negative, it returns a nodelist with the nth element counting from the right. If no such node exists, returns the empty list. n may not equal zero.

Examples:

((node-pos 1) '((div "hi")(span "hello")(em "really, hi!")))

=>

((div "hi"))

((node-pos 6) '((div "hi")(span "hello")(em "really, hi!")))

=>

()

((node-pos -1) '((div "hi")(span "hello")(em "is this thing on?")))

=>

((em "is this thing on?"))

[procedure](sxml:filter pred?)

Returns a procedure that accepts a nodelist or a node (which will be converted to a one-element nodelist) and returns only those nodes for which the predicate pred? does not return #f or '().

The take-until variant returns everything before the first node for which the predicate pred? returns anything but #f or '(). In other words, it returns the longest prefix for which the predicate returns #f or '().

The take-after variant returns everything after the first node for which the predicate pred? returns anything besides #f or '().

Apply proc to each element of the nodelist lst and return the list of results. If proc returns a nodelist, splice it into the result (essentially returning a flattened nodelist).

[procedure](node-reverse node-or-nodelist)

Accepts a nodelist and reverses the nodes inside. If a node is passed to this procedure, it returns a nodelist containing just that node. (it does not change the order of the children).

Converter combinators

Combinators are higher-order functions that transmogrify a converter or glue a sequence of converters into a single, non-trivial converter. The goal is to arrive at converters that correspond to XPath location paths.

From a different point of view, a combinator is a fixed, named pattern of applying converters. Given below is a complete set of such patterns that together implement XPath location path specification. As it turns out, all these combinators can be built from a small number of basic blocks; regular functional composition, map-union and filter applicators, and the nodelist union.

[procedure](select-kids pred?)

Returns a procedure that accepts a node and returns a nodelist of the node's children that satisfy pred? (ie, pred? returns anything but #f or '()).

[procedure](node-self pred?)

Similar to select-kids but applies to the node itself rather than to its children. The resulting Nodelist will contain either one component (the node), or will be empty (if the node failed the predicate).

[procedure](node-join . selectors)

Returns a procedure that accepts a nodelist or a node, and returns a nodelist with all the selectors applied to every node in sequence. The selectors must function as converter combinators, ie they must accept a node and output a nodelist.

i.e., folding, or reducing, a list of converters with the nodelist as a seed.

[procedure](node-or . converters)

This combinator applies all converters to a given node and produces the union of their results. This combinator corresponds to a union, "|" operation for XPath location paths.

[procedure](node-closure test-pred?)

Select all descendants of a node that satisfy a converter-predicate. This combinator is similar to select-kids but applies to grandchildren as well.

[procedure](node-trace title)

Returns a procedure that accepts a node or a nodelist, which it pretty-prints to the current output port, preceded by title. It returns the node or the nodelist unchanged. This is a useful debugging aid, since it doesn't really do anything besides print its argument and pass it on.

[procedure](sxml:node? obj)

Returns #t if the given obj is an SXML node, #f otherwise. A node is anything except an attribute list or an auxiliary list.

[procedure](sxml:attr-list node)

Returns the list of attributes for a given SXML node. The empty list is returned if the given node is not an element, or if it has no list of attributes.

This differs from sxml:attr-list-u in that this procedure accepts any SXML node while sxml:attr-list-u only accepts nodelists or elements. This means that sxml:attr-list-u will throw an error if you pass it a text node (a string), while sxml:attr-list will not.

[procedure](sxml:attribute test-pred?)

Like sxml:filter, but considers the attributes instead of the nodes. Returns a nodelist of attribtes that match test-pred?.

This procedure is similar to select-kids, but it returns an empty child-list for PI, Comment and Entity nodes.

[procedure](sxml:parent test-pred?)

Returns a procedure that accepts a root-node, and returns another procedure. This second procedure accepts a nodeset (or a node) and returns the immediate parents of the nodes in the set, but only if for those parents that match the predicate.

The root-node does not have to be the root node of the whole SXML tree -- it may be a root node of a branch of interest.

This procedure can be used with any SXML node.

Useful shortcuts

[procedure](node-parent node)

(node-parent rootnode) yields a converter that returns a parent of a node it is applied to. If applied to a nodelist, it returns the list of parents of nodes in the nodelist.

This is equivalent to ((sxml:parent (ntype? '*any*)) node).

[procedure](sxml:child-nodes node)

Returns all the child nodes of the given node.

This is equivalent to ((sxml:child sxml:node?) node).

[procedure](sxml:child-elements node)

Returns all the child elements of the given node. (ie, excludes any textnodes).

This is equivalent to ((select-kids sxml:element?) node).

Procedures from sxpath-ext

SXML counterparts to W3C XPath Core Functions Library

[procedure](sxml:string object)

The counterpart to XPath 'string' function (section 4.2 XPath 1.0 Rec.). Converts a given object to a string.

Notes:

When converting a nodeset, document order is not preserved

number->string returns the result in a form which is slightly different from XPath Rec. specification

[procedure](sxml:boolean object)

The counterpart to XPath 'boolean' function (section 4.3 XPath Rec.). Converts its argument to a boolean.

[procedure](sxml:number object)

The counterpart to XPath 'number' function (section 4.4 XPath Rec.). Converts its argument to a number.

Notes:

The argument is not optional (yet?)

string->number conversion is not IEEE 754 round-to-nearest

NaN is represented as 0

[procedure](sxml:string-value node)

Returns a string value for a given node in accordance to XPath Rec. 5.1 - 5.7

[procedure](sxml:id id-index)

Returns a procedure that accepts a nodeset and returns a nodeset containing the elements in the id-index that match the string-values of each entry of the nodeset. XPath Rec. 4.1

The id-index is an alist with unique IDs as key, and elements as values:

id-index = ( (id-value . element) (id-value . element) ... )

Comparators for XPath objects

[procedure](sxml:list-head list n)

Returns the n first members of list. Mostly equivalent to SRFI-1's take procedure, except it returns the list if n is larger than the length of said list, instead of throwing an error.

[procedure](sxml:merge-sort less-than? list)

Returns the sorted list, the smallest member first.

less-than? ::= (lambda (obj1 obj2) ...)

less-than? returns #t if obj1 < obj2 with respect to the given ordering.

Returns a procedure that accepts two objects, looks at the first object's type and applies the correct comparison predicate to it. Type coercion takes place depending on the rules described in the XPath 1.0 spec, section 3.4 ("Booleans").

Equality procedures with the default comparison operators eq?, = and string=?, or their inverse, respectively.

[procedure](sxml:relational-cmp op)

A helper for XPath relational operations: <, >, <=, >= for two XPath objects. op is one of these operators.

Returns a procedure that accepts two objects and returns the value of the procedure applied to these objects, converted according to the coercion rules described in the XPath 1.0 spec, section 3.4 ("Booleans").

XPath axes

[procedure](sxml:ancestor test-pred?)

Like sxml:parent, except it returns all the ancestors that match test-pred?, not just the immediate parent.

[procedure](sxml:ancestor-or-self test-pred?)

Like sxml:ancestor, except also allows the node itself to match the predicate.

[procedure](sxml:descendant test-pred?)

Like node-closure, except the resulting nodeset is in depth-first order instead of breadth-first.

[procedure](sxml:descendant-or-self test-pred?)

Like sxml:descendant, except also allows the node itself to match the predicate.

[procedure](sxml:following test-pred?)

Returns a procedure that accepts a root node and returns a new procedure that accepts a node and returns all nodes following this node in the document source matching the predicate.

[procedure](sxml:following-sibling test-pred?)

Like sxml:following, except only siblings (nodes at the same level under the same parent) are returned.

[procedure](sxml:preceding test-pred?)

Returns a procedure that accepts a root node and returns a new procedure that accepts a node and returns all nodes preceding this node in the document source matching the predicate.

[procedure](sxml:preceding-sibling test-pred?)

Like sxml:preceding, except only siblings (nodes at the same level under the same parent) are returned.

[procedure](sxml:namespace test-pred?)

Returns a procedure that accepts a nodeset and returns the namespace lists of the nodes matching test-pred?.