11.6 srfi-13 - String library

Module: srfi-13

Defines a large set of string-related functions.
In Gauche, those functions are splitted to number of files
and the form (use srfi-13) merely sets up autoloading of
those files. So it is not likely to slow down the script startup.
See SRFI-13 (SRFI-13)
for the detailed specification and discussion of design issues.
This manual serves as a reference of function API.
Some SRFI-13 functions are Gauche built-in and not listed here.
Note: SRFI-13 documents suggests the name of the module that
implements these functions to be “string-lib” and “string-lib-internals”.
Gauche uses the name “srfi-13” for consistency.

11.6.1 General conventions

There are a few common factors in string library API, which I don’t
repeat in each function description

argument convention

The following argument names imply their types.

s, s1, s2

Those arguments must be strings.

char/char-set/pred

This argument can be a character, a character-set object,
or a predicate that takes a single character and returns a boolean value.
“Applying char/char-set/pred to a character” means,
if char/char-set/pred is a character, it is compared to the given
character; if char/char-set/pred is a character set, it is
checked if the character set contains the given character; if
char/char-set/pred is a procedure, it is applied
to the given character. “A character satisfies char/char-set/pred”
means such application to the character yields true value.

start, end

Lots of SRFI-13 functions takes these two optional arguments, which
limit the area of input string from start-th character
(inclusive) to end-th character (exclusive),
where the operation is performed.
When specified, the condition
0 <= start <= end <= length of the string must be
satisfied. Default value of start and end is
0 and the length of the string, respectively.

shared variant

Some functions have variants with “/shared” attached to its name.
SRFI-13 defines those functions to allow to share the part of input
string, for better performance. Gauche doesn’t have a concept of
shared string, and these functions are mere synonyms of their
non-shared variants. However, Gauche internally shares
the storage of strings, so generally you don’t need to worry
about the overhead of copying substrings.

right variant

Most functions works from left to right of the input string.
Some functions have variants with “-right” to its name,
that works from right to left.

11.6.2 String predicates

Function: string-null?s

[SRFI-13] Returns #t if s is an empty string, "".

Function: string-everychar/char-set/pred s :optional start end

[SRFI-13] Sees if every character in s satisfies
char/char-set/pred. If so, string-every returns
the value that is returned at the last application of char/char-set/pred.
If any of the application returns #f, string-every
returns #f immediately.

Function: string-anychar/char-set/pred s :optional start end

[SRFI-13] Sees if any character in s satisfies
char/char-set/pred. If so, string-any returns
the value that is returned by the application. If no character
satisfies char/char-set/pred, #f is returned.

11.6.4 String selection

[SRFI-13] In Gauche, this is the same as substring, except
that the end argument is optional.

(substring/shared "abcde" 2) ⇒ "cde"

Function: string-copy!target tstart s :optional start end

[SRFI-13] Copies a string s into a string
target from the position tstart.
The target string must be mutable.
Optional start and end arguments limits the range of s.
If the copied string run over the end of target, an error is
signaled.

(define s (string-copy "abcde"))
(string-copy! s 2 "ZZ")
s ⇒ "abZZe"

It is ok to pass the same string to target and s; this
always work even if the regions of source and destination are overlapping.

Function: string-takes nchars

Function: string-drops nchars

Function: string-take-rights nchars

Function: string-drop-rights nchars

[SRFI-13] Returns the first nchars-character string of s
(string-take) or the string without first nchars
(string-drop). The *-right variation counts from
the end of string. It is guaranteed that the returned string is
always a copy of s, even no character is dropped.

[SRFI-13]
If a string s is shorter than len,
returns a string of len where char is
padded to the left or right, respectively.
If s is longer than len, the rightmost
or leftmost len chars are taken.
Char defaults to #\space.
If start and end are provided,
the substring of s is used as the source.

[SRFI-13]
Removes characters that match char/char-set/pred
from s. String-trim removes the characters from
left of s, string-trim-right does from right,
and string-trim-both does from both sides.
Char/char-set/pred defaults to #[\s], i.e. a char-set
of whitespaces.
If start and end are provided,
the substring of s is used as the source.

11.6.5 String comparison

[SRFI-13]
Compares two strings s1 and s2 codepoint-wise from left.
When mismatch is found at the index k of s1,
calls proc< with k if s1’s codepoint is smaller than
the corresponding s2’s, or calls proc> if s1’s one is
greater than s2’s. If two strings are the same, calls proc=
with the index of the last compared position in s1.

The case-insensitive variant, string-compare-ci, compares
each codepoint with character-wise case-folding. It won’t consider
special case folding such as German eszett.

Function: string=s1 s2 :optional start1 end1 start2 end2

Function: string<>s1 s2 :optional start1 end1 start2 end2

Function: string<s1 s2 :optional start1 end1 start2 end2

Function: string<=s1 s2 :optional start1 end1 start2 end2

Function: string>s1 s2 :optional start1 end1 start2 end2

Function: string>=s1 s2 :optional start1 end1 start2 end2

[SRFI-13]
Compare two strings s1 and s2. Optional arguments
can limit the portion of strings to be compared.
Comparison is done by character-wise.

Note: The builtin procedures string=? etc. can also be
used for character-wise string comparison, but they take
arguments differently. See String Comparison.

Function: string-ci=s1 s2 :optional start1 end1 start2 end2

Function: string-ci<>s1 s2 :optional start1 end1 start2 end2

Function: string-ci<s1 s2 :optional start1 end1 start2 end2

Function: string-ci<=s1 s2 :optional start1 end1 start2 end2

Function: string-ci>s1 s2 :optional start1 end1 start2 end2

Function: string-ci>=s1 s2 :optional start1 end1 start2 end2

[SRFI-13]
Compare two strings s1 and s2 in case-insensitive way.
Optional arguments can limit the portion of strings to be compared.
Case folding and comparison is done by character-wise, so they don’t
consider case folding that affects multiple characters.

Note: We have two other sets of string comparison operations,
both are named as string-ci=? etc.
The builtin version (see String Comparison) does character-wise
comparison. The one in gauche.unicode uses full-string
case conversion (see Full string case conversion).
R7RS version is the latter.

Function: string-hashs :optional bound start end

Function: string-hash-cis :optional bound start end

[SRFI-13]
(Note: Gauche has builtin string-hash and string-ci-hash
according to SRFI-128. See Hashing, for the details.
SRFI-13’s API is upper-compatible to SRFI-128’s. The underlying
hash algorighm is the same as the builtin ones, so string-hash
returns the same value as the builtin ones for the same string
if optional arguments are omitted.
On the other hand, the builtin string-ci-hash uses string case
folding (e.g. German eszett and SS are the same), while
SRFI-13’s string-hash-ci uses character-wise case folding.
Unless there’s a strong reason, we recommend new code should use
builtin SRFI-128 version instead of SRFI-13.)

Calculates hash value of a string s. For string-hash-ci,
character-wise case folding is done before calculating the hash value.

If the optional bound argument is given, it must be a positive
exact integer, and the return value is limited below it.
The optional start and end arguments allows
using that portion for calculation.

[SRFI-13]
Returns the length of the longest common prefix/suffix of two strings,
s1 and s2. The optional arguments restrict the range of
search. The *-ci variations use case foling character comparison.

11.6.7 String searching

Function: string-indexs char/char-set/pred :optional start end

Function: string-index-rights char/char-set/pred :optional start end

[SRFI-13] Looks for the first element in a string s
that matches char/char-set/pred, and returns its index.
If char/char-set/pred is not found in s, returns #f.
Optional start and end limit the range of s to search.

See also the Gauche built-in procedure string-scan
(String utilities), if you need speed over portability.

Function: string-skips char/char-set/pred :optional start end

Function: string-skip-rights char/char-set/pred :optional start end

[SRFI-13] Looks for the first element that does not match
char/char-set/pred and returns its index.
If such element is not found, returns #f.
Optional start and end limit the range of s to search.

Function: string-counts char/char-set/pred :optional start end

[SRFI-13] Counts the number of elements in s
that matches char/char-set/pred.
Optional start and end limit the range of s to search.

Function: string-containss1 s2 :optional start1 end1 start2 end2

Function: string-contains-cis1 s2 :optional start1 end1 start2 end2

[SRFI-13] Looks for a string s2 inside another string s1.
If found, returns an index in s1 from where the matching string
begins. Returns #f otherwise.
Optional start1, end1, start2 and end2
limits the range of s1 and s2.

See also the Gauche built-in procedure string-scan
(String utilities), if you need speed over portability.

[SRFI-13]
Converts a string s to titlecase, upcase or downcase,
respectively. These operations uses character-by-character
mapping provided by char-upcase etc. That is, string-upcase
and string-downcase can be understood as follow:

If you need full case mapping that handles the case when
a character is mapped to more than one characters, use
the procedures with the same name in gauche.unicode module
(see Full string case conversion).

The linear-update version string-titlecase!, string-upcase!
and string-downcase! destroys s to store the result.
Note that in Gauche, using those procedures doesn’t save anything,
since string mutation is expensive by design. They are provided merely
for completeness.

[SRFI-13]
A fundamental string builder. The p, f and g are
procedures, taking the current seed value. The stop predicate p
determines when to stop: If it returns a true value, string building
stops. The mapping function f returns a character
from the current seed value. The next seed function g returns
a next seed value from the current seed value. The seed argument
gives the initial seed value.

The optional argument base is, when given, prepended to the
result string. Another optional argument make-final
is a procedure that takes the last return value of g and
returns a string that becomes the suffix of the result string.

[SRFI-13]
Another fundamental string builder. The meanings of arguments
are the same as ‘string-unfold’. The only difference is
that the string is build right-to-left. The optional base,
if given, becomes the suffix of result, and the result of
make-final becomes the prefix.

11.6.12 Other string operations

Function: string-replaces1 s2 start1 end1 :optional start2 end2

[SRFI-13]
Returns a new string whose content is a copy of a string s1, except
the part beginning from the index
start1 (inclusive) and ending at the index end1 (exclusive)
are replaced by a string s2. When optional start2 and end2
arguments are given, s2 is trimmed first according to them.
The size of the gap, (- end1start1), doesn’t
need to be the same as the size of the inserted string.
Effectively, this is the same as the following code.

[SRFI-13]
Splits the string s into a list of substrings,
where each substring is a maximal non-empty contiguous
sequence of characters from the character set token-set.
The default of token-set is char-set:graphic
(see SRFI-14 Predefined character-set).

See also Gauche’s built-in string-split (see String utilities),
which provides similar features but different criteria.

Note: Srfi-13 was revised after finalization to switch
the order of arguments char/char-set/pred and s was.
At the time of finalization, the order was
(string-filter s pred) and Gauche implemented it accordingly.
However, most existing implementations follows the revised order,
since that was what the srfi-13 reference implementation had.

So, from 0.9.4, we revised the API to comply the current
srfi-13 spec, but we also accept the old order as well
not to break the old code.
We recommend the new code to use the new order.