http://zorba.io/modules/data-cleaning/consolidation

This library module provides data consolidation functions that generally take as input a sequence of XML nodes
and apply some rule in order do decide which node is better suited to represent the entire sequence.

The logic contained in this module is not specific to any particular XQuery implementation,
although the consolidation functions based on matching sequences against XPath expressions require
some form of dynamic evaluation for XPath expressions.

Parameters

Returns

The node having the smallest number of distinct descending attributes in the input sequence.

least-distinct-elements#1

declare function con:least-distinct-elements($s) as element(*)

Returns the single node having the smallest number of distinct descending elements (sub-elements at any
given depth) in a sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

least-distinct-elements( ( <a><b/></a>, <b><c/></b>, <d/>) )

The function invocation in the example above returns :

(<d/>)

Parameters

s as

A sequence of nodes.

Returns

element(*)

The node having the smallest number of distinct descending elements in the input sequence.

least-distinct-nodes#1

declare function con:least-distinct-nodes($s) as element(*)

Returns the single node having the smallest number of distinct descending nodes (sub-nodes at any given depth)
in a sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

least-distinct-nodes( ( <a><b/></a>, <b><c/></b>, <d/>) )

The function invocation in the example above returns :

(<d/>)

Parameters

s as

A sequence of nodes.

Returns

element(*)

The node having the smallest number of distinct descending nodes in the input sequence.

least-elements#1

declare function con:least-elements($s) as element(*)

Returns the single node having the smallest number of descending elements (sub-elements at any given depth)
in a sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

least-elements( ( <a><b/></a>, <b><c/></b>, <d/>) )

The function invocation in the example above returns :

(<d/>)

Parameters

s as

A sequence of nodes.

Returns

element(*)

The node having the smallest number of descending elements in the input sequence.

least-frequent#1

declare function con:least-frequent($s) as item()

Returns the single less frequent node in a sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

least-frequent( ( "a", "a", "b") )

The function invocation in the example above returns :

("b")

Parameters

s as

A sequence of nodes.

Returns

item()

The least frequent node in the input sequence.

least-nodes#1

declare function con:least-nodes($s) as element(*)

Returns the single node having the smallest number of descending nodes (sub-nodes at any given depth)
in a sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

least-nodes( ( <a><b/></a>, <b><c/></b>, <d/>) )

The function invocation in the example above returns :

(<d/>)

Parameters

s as

A sequence of nodes.

Returns

element(*)

The node having the smallest number of descending nodes in the input sequence.

least-similar-edit-distance#2

declare function con:least-similar-edit-distance($s as xs:string*, $m as xs:string) as xs:string?

Returns the single least similar string, in terms of the edit distance metric towards an input string,
in a sequence of strings provided as input. If more than one string has a minimum similarity (a maximum
value for the edit distance metric), return the first string according to the order of the input sequence.

Parameters

Returns

The node having the largest number of distinct descending elements in the input sequence.

most-distinct-nodes#1

declare function con:most-distinct-nodes($s) as element(*)

Returns the single node having the largest number of distinct descending nodes (sub-nodes at any given depth) in
a sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

most-distinct-nodes( ( <a><b/></a>, <a><a/></a>, <b/>) )

The function invocation in the example above returns :

(<a><b/></a>)

Parameters

s as

A sequence of nodes.

Returns

element(*)

The node having the largest number of distinct descending nodes in the input sequence.

most-elements#1

declare function con:most-elements($s) as element(*)

Returns the single node having the largest number of descending elements (sub-elements at any given depth)
in a sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

most-elements( ( <a><b/></a>, <a/>, <b/>) )

The function invocation in the example above returns :

(<a><b/></a>)

Parameters

s as

A sequence of nodes.

Returns

element(*)

The node having the largest number of descending elements in the input sequence.

most-frequent#1

declare function con:most-frequent($s) as item()

Returns the single most frequent node in a sequence of nodes provided as input.

If more then one answer is possible, returns the first node according to the order of the input sequence.

Example usage :

most-frequent( ( "a", "a", "b") )

The function invocation in the example above returns :

("a")

Parameters

s as

A sequence of nodes.

Returns

item()

The most frequent node in the input sequence.

most-nodes#1

declare function con:most-nodes($s) as element(*)

Returns the single node having the largest number of descending nodes (sub-nodes at any given depth) in a
sequence of nodes provided as input.

If more then one answer is possible, return the first node according to the order of the input sequence.

Example usage :

most-nodes( ( <a><b/></a>, <a/>, <b/>) )

The function invocation in the example above returns :

(<a><b/></a>)

Parameters

s as

A sequence of nodes.

Returns

element(*)

The node having the largest number of descending nodes in the input sequence.

most-similar-edit-distance#2

declare function con:most-similar-edit-distance($s as xs:string*, $m as xs:string) as xs:string?

Returns the single most similar string, in terms of the edit distance metric towards an input string,
in a sequence of strings provided as input. If more than one string has a maximum similarity (a minimum
value for the edit distance metric), the function return the first string according to the order of the
input sequence.

Parameters

Returns

The element that matches the largest number of XPath expressions producing a non-empty set of nodes.

shortest#1

declare function con:shortest($s as xs:string*) as xs:string?

Returns the single shortest string, in terms of the number of characters, in a sequence of strings provided as input.

If more then one answer is possible, return the first string according to the order of the input sequence.

Example usage :

shortest( ( "a", "aa", "aaa") )

The function invocation in the example above returns :

("a")

Parameters

s as xs:string

A sequence of strings.

Returns

xs:string?

The shortest string in the input sequence.

some-xpaths#2

declare function con:some-xpaths($s as element(*)*, $paths as xs:string*) as element(*)*

Returns the elements from a sequence of elements that, when matched to a given set of XPath expressions,
produce a non-empty set of nodes for some of the cases.

Example usage :

some-xpaths( ( <a><b/></a>, <d><c/></d>, <d/>), (".//b", ".//c") )

The function invocation in the example above returns :

( <a><b/></a> , <d><c/></d> )

Parameters

s as element(*)

A sequence of elements.

paths as xs:string

A sequence of strings denoting XPath expressions.

Returns

element(*)*

The elements that, when matched to the given set of XPath expressions, return a non-empty set of nodes for at least one of the cases.

superstring#1

declare function con:superstring($s as xs:string*) as xs:string?

Returns the single string, from an input sequence of strings, that appears more frequently as part
of the other strings in the sequence. If no such string exists, the function returns an empty sequence.

If more then one answer is possible, the function returns the first string according to the order of the input sequence.

Example usage :

super-string( ( "aaa bbb ccc", "aaa bbb", "aaa ddd", "eee fff" ) )

The function invocation in the example above returns :

( "aaa bbb" )

Parameters

s as xs:string

A sequence of strings.

Returns

xs:string?

The string that appears more frequently as part of the other strings in the sequence.

validating-schema#2

declare function con:validating-schema($s as element(*)*, $schema as element(*)) as element(*)*

Returns the nodes from an input sequence of nodes that validate against a given XML Schema.