This is documentation for the ANVL Perl module, which provides a general framework for data represented in the ANVL format. ANVL (A Name Value Language) represents elements in a label-colon-value format similar to email headers. Specific conversions, based on an "output multiplexer" File::OM, are possible to XML, Turtle, JSON, CSV, and PSV (Pipe Separated Value), and Plain unlabeled text.

The OM package can also be used to build records from scratch in ANVL or other the formats. Below is an example of how to create a particular kind of ANVL record known as an ERC (which uses Dublin Kernel metadata). For the formats ANVL, Plain, and XML, the returned text string by default is wrapped to 72 columns.

The getlines() function reads from $filehandle up to a blank line and returns the lines read. This is a general function for reading "paragraphs", which is useful for reading ANVL records. If unspecified, $filehandle defaults to *ARGV, which makes it easy to take input from successive file arguments specified on the command line (or from STDIN if none) of the calling program.

For convenience, trimlines() is often used to process the record just returned by getlines(). It strips leading whitespace, optionally counts lines, and returns undef if the passed record is undefined or contains only whitespace, both being equivalent to end-of-file (EOF).

These functions treat whitespace specially. Input is read up until at least one non-whitespace character and a blank line (two newlines in a row) or EOF is reached. If EOF is reached and the record would contain only whitespace, undef is returned. Input line counts for preliminary trimmed whitespace ($wslines) and real record lines ($rrlines) can be returned through optional scalar references given to trimlines(). These functions work together to permit the caller access to all inputs, to accurate line counts, and a familiar "loop until EOF" paradigm, as in

while (defined trimlines(getlines(), \$wslcount, \$rrlcount)) ...

The anvl_recarray() function splits an ANVL record into ANVL elements, returning them via the array reference given as the second argument. The n-th returned ANVL element corresponds to three Perl array elements as follows:

This means, for example, that the first two ANVL element names would be found at Perl array indices 4 and 7. The first triple is special; array elements 0 and 2 are undefined unless the record begins with an unlabeled value (not strictly ANVL), such as,

Smith, Jo
home: 555-1234
work: 555-9876

in which case they contain the line number and value, respectively. Array element 1 always contains a string naming the format of the input, such as, "ANVL", "JSON", "XML", etc.

The remaining triples are free form except that the values will have been drawn from the original format and possibly decoded. The first item ("lineno") in each remaining triple is a number followed by a character, for example, "34:" or "6#". The number indicates the line number (or octet offset, depending on the origin format) of the start of the element. The character is either ':' to indicate a real element or '#' to indicate a comment; if the latter, the element name has no defined meaning and the comment is contained in the value. Here's example code that reads a 3-element record and reformats it.

An optional third argument to anvl_recarray gives the starting line number (default 1). An optional fourth argument is a reference to a hash containing options; the argument { comments => 1, autoindent => 0 } will cause comments to be kept (stripped by default) and recoverable indention errors to be flagged as errors (corrected to continuation lines by default). This function returns the empty string on success, or a message beginning "warning: ..." or "error: ...".

erc_anvl_expand_array() inspects and possibly modifies in place the kind of element array resulting from a call to anvl_recarray(). It returns the empty string on success, otherwise an error message. This routine is useful for transforming a short form ERC ANVL record into long form, for example, expanding erc: a | b | c | d into the equivalent,

erc:
who: a
what: b
when: c
where: d

The anvl_arrayhash() function takes the kind of element array resulting from a call to anvl_recarry or erc_anvl_expand_array() and modifies the hash reference given as the second argument by storing, for each element name, a list of integers corresponding to the triples that bear that name. You should always undefine the hash first or you may see unexpected results. So to print the value (the 2nd array element past the start of the triple) of the first instance (index 0) of "who",

The anvl_valsplit() function splits an ANVL value into sub-values (svals) and repeated values (rvals), returning them as an array of arrays via the array reference given as the second argument. The top-level of the array represents svals and the next level represents rvals. This function returns the empty string on success, or a message beginning "warning: ..." or "error: ...".

The anvl_decode() function takes an ANVL-encoded string and returns it after converting encoded characters to the standard representaion (e.g., %vb becomes `|'). Some decoding, such as for the expansion block below,

The anvl_name_naturalize() function takes an ANVL string (aval) and returns it after inversion at any designated inversion points. The input string will be returned if it does not end in a comma (`,'). The more terminal commas, the more inversion points tried. For example, the calls

take sort-friendly strings (commonly used to make ANVL records easy to sort) and return the natural word order strings,

Pat Smith
Sir Paul McCartney
Hu Jintao

The anvl_om() routine takes a formatting object created by a call to File::OM($format), reads a stream of ANVL records, processes each element, and calls format-specific methods to build the output. Those methods are typically affected by transferring command line options in at object creation time.

Options control various aspects of reading ANVL input records. The 'autoindent' option (default on) causes the parser to recover if it can when continuation lines are not properly indented. As a special case, if the first line of the record has no label, leaving 'autoindent' on will cause anvl_recarray() to preserve it's value and line number in the first triple, which anvl_om() will detect and pass through with the synthesized name '_'.

The 'elem_order' option (default undefined) can be used to control which elements are output and their ordering. If set to a reference to an array of element names, which may contain repeated names, the specified elements (and no others) are output in the specified order. Normally, all elements present in the array are output. Under the CSV and PSV formats, element order is by default inferred by the ordering of elements found in the first record.

The 'comments' options (default off) causes input comments to be preserved in the output, format permitting. The 'verbose' option inserts record and line numbers in comments. Pseudo-comments will be created for formats that don't natively define comments (JSON, Plain).

Like the individual OM methods, anvl_om() returns the built string by default, or the return status of print using the file handle supplied as the 'outhandle' options (normally set to '') at object creation time, for example,

DEPRECATED: The anvl_rechash() function splits an ANVL record into elements, returning them via the hash reference given as the second argument. A hash key is defined for each element name found. Under that key is stored the corresponding element value, or an array of values if more than one occurrence of the element name was encountered. This function returns the empty string on success, or a message beginning "warning: ..." or "error: ...".

DEPRECATED: The anvl_recsplit() function splits an ANVL record into elements, returning them via the array reference given as the second argument. Each returned element is a pair of elements: a name and a value. An optional third argument, if true (default 0), rejects unindented continuation lines, a common formatting mistake. This function returns the empty string on success, or message beginning "warning: ..." or "error: ...". Here's an example that extracts and uses the first returned element.