[HemlockProgrammer Back to Table of Contents]
[[PageOutline]]
= 2. Representation of Text =
In Hemlock, text is represented as a sequence of lines. Newline characters
are never stored but are implicit between lines. The
implicit newline character is treated as the single character `#\Newline` by the
text primitives.
Text is broken into lines when it is first introduced into Hemlock. Text enters
Hemlock from the outside world in two ways: reading a file, or pasting text
from the system clipboard. Hemlock uses heuristics '''(which should be documented here!)'''
to decide what newline convention to use to convert the incoming text into its internal
representation as a sequence of lines. Similarly it uses heuristics
'''(which should be documented here!)''' to convert the internal representation into
a string with embedded newlines in order to write a file or paste a region into
the clipboard.
== 2.1. Lines ==#Lines
A `line` is an object representing a sequence of characters with no line breaks.
`linep` line [Function]
This function returns t if line is a line object, otherwise nil.
`line-string` line [Function]
Given a line, this function returns as a simple string the characters
in the line. This is setf'able to set the line-string to any string
that does not contain newline characters. It is an error to
destructively modify the result of line-string or to destructively
modify any string after the line-string of some line has been set to
that string.
`line-previous` line [Function][[BR]]
`line-next` line [Function]
Given a line, line-previous returns the previous line or nil if there
is no previous line. Similarly, line-next returns the line following
line or nil.
`line-buffer` line [Function]
This function returns the buffer which contains this line. Since a
line may not be associated with any buffer, in which case line-buffer
returns nil.
`line-length` line [Function]
This function returns the number of characters in the line. This
excludes the newline character at the end.
`line-character` line index [Function]
This function returns the character at position index within line. It
is an error for index to be greater than the length of the line or
less than zero. If index is equal to the length of the line, this
returns a #\newline character.
`line-plist` line [Function]
This function returns the property-list for line. setf, getf, putf and
remf can be used to change properties. This is typically used in
conjunction with line-signature to cache information about the line's
contents.
`line-signature` line [Function]
This function returns an object that serves as a signature for a
line's contents. It is guaranteed that any modification of text on the
line will result in the signature changing so that it is not eql to
any previous value. The signature may change even when the text
remains unmodified, but this does not happen often.
== 2.2. Marks ==#Marks
A `mark` indicates a specific position within the text represented by a
line and a character position within that line. Although a mark is
sometimes loosely referred to as pointing to some character, it in
fact points between characters. If the charpos is zero, the previous
character is the newline character separating the previous line from
the mark's line. If the charpos is equal to the number of characters
in the line, the next character is the newline character separating
the current line from the next. If the mark's line has no previous
line, a mark with charpos of zero has no previous character; if the
mark's line has no next line, a mark with charpos equal to the length of
the line has no next character.
This section discusses the very basic operations involving marks, but
a lot of Hemlock programming is built on altering some text at a mark.
For more extended uses of marks see [HemlockProgrammer/AlteringAndSearchingText Altering And Searching Text].
=== 2.2.1. Kinds of Marks ===
A mark may have one of two lifetimes: temporary or permanent. Permanent
marks remain valid after arbitrary operations on the text; temporary
marks do not. Temporary marks are used because less bookkeeping
overhead is involved in their creation and use. If a temporary mark
is used after the text it points to has been modified results will be
unpredictable. Permanent marks continue to point between the same two
characters regardless of insertions and deletions made before or after
them.
There are two different kinds of permanent marks which differ only in
their behavior when text is inserted at the position of the mark; text
is inserted to the left of a left-inserting mark and to the right of
right-inserting mark.
=== 2.2.2. Mark Functions ===
`markp` mark [Function]
This function returns t if mark is a mark object, otherwise nil.
`mark-line` mark [Function]
This function returns the line to which mark points.
`mark-charpos` mark [Function]
This function returns the character position ''in the line'' of the character
after mark, i.e. the number of characters before the mark in the mark's line.
`mark-buffer` mark [Function]
Returns the buffer containing this mark.
`mark-absolute-position` mark [Function]
This function returns the character position ''in the buffer'' of the character
after the mark, i.e. the number of characters before the mark in the mark's
buffer.
`mark-kind` mark [Function]
This function returns one of `:right-inserting`, `:left-inserting` or
`:temporary` depending on the mark's kind. A corresponding setf form
changes the mark's kind.
`previous-character` mark [Function][[BR]]
`next-character` mark [Function]
This function returns the character immediately before (after) the
position of the mark, or nil if there is no previous (next) character.
These characters may be set with setf when they exist; the setf
methods for these forms signal errors when there is no previous or
next character.
=== 2.2.3. Making Marks ===
`mark` line charpos &optional kind [Function]
This function returns a mark object that points to the charpos'th
character of the line. Kind is the kind of mark to create, one
of `:temporary`, `:left-inserting`, or `:right-inserting`. The default is
:temporary.
`copy-mark` mark &optional kind [Function]
This function returns a new mark pointing to the same position and of
the same kind, or of kind kind if it is supplied.
`delete-mark` mark [Function]
This function deletes mark. Delete any permanent marks when you are
finished using it.
`with-mark` ({(mark pos [kind])}*) {form}* [Macro]
This macro binds to each variable mark a mark of kind kind, which
defaults to `:temporary`, pointing to the same position as the
markpos. On exit from the scope the mark is deleted. The value of the
last form is the value returned.
=== 2.2.4. Moving Marks ===
These functions destructively modify marks to point to new positions.
Other sections of this document describe mark moving routines specific
to higher level text forms than characters and lines, such as words,
sentences, paragraphs, Lisp forms, etc.
`move-to-position` mark charpos &optional line [Function]
This function changes the mark to point to the given character
position on the line line. Line defaults to mark's line.
`move-to-absolute-position` mark position [Function]
This function changes the mark to point to the given character
position in the buffer.
`move-mark` mark new-position [Function]
This function moves mark to the same position as the
mark new-position and returns it.
`line-start` mark &optional line [Function][[BR]]
`line-end` mark &optional line [Function]
This function changes mark to point to the beginning or the end of
line and returns it. Line defaults to mark's line.
`buffer-start` mark &optional buffer [Function][[BR]]
`buffer-end` mark &optional buffer [Function]
These functions change mark to point to the beginning or end of
buffer, which defaults to the buffer mark currently points into. If
buffer is unsupplied, then it is an error for mark to be disassociated
from any buffer.
`mark-before` mark [Function][[BR]]
`mark-after` mark [Function]
These functions change mark to point one character before or after the
current position. If there is no character before/after the current
position, then they return nil and leave mark unmodified.
`character-offset` mark n [Function]
This function changes mark to point n characters after (n before if n
is negative) the current position. If there are less than n
characters after (before) the mark, then this returns nil and mark is
unmodified.
`line-offset` mark n &optional charpos [Function]
This function changes mark to point n lines after (n before if n is
negative) the current position. The character position of the
resulting mark is (min (line-length resulting-line) (mark-charpos
mark)) if charpos is unspecified, or (min (line-length resulting-line)
charpos) if it is. As with character-offset, if there are not n lines
then nil is returned and mark is not modified.
== 2.3. Regions ==#Regions
A `region` is simply a pair of marks: a starting mark and an ending
mark. The text in a region consists of the characters following the
starting mark and preceding the ending mark (keep in mind that a mark
points between characters on a line, not at them). By modifying the
starting or ending mark in a region it is possible to produce regions
with a start and end which are out of order or even in different
buffers. The use of such regions is undefined and may result in
arbitrarily bad behavior.
=== 2.3.1. Region Functions ===
`region` start end [Function]
This function returns a region constructed from the marks start and
end. It is an error for the marks to point to non-contiguous lines or
for start to come after end.
`regionp` region [Function]
This function returns t if region is a region object, otherwise nil.
`make-empty-region` [Function]
This function returns a region with start and end marks pointing to
the start of one empty line. The start mark is a `:right-inserting`
mark, and the end is a `:left-inserting` mark.
`copy-region` region [Function]
This function returns a region containing a copy of the text in the
specified region. The resulting region is completely disjoint
from region with respect to data references --- marks, lines, text, etc.
`region-to-string` region [Function][[BR]]
`string-to-region` string [Function]
These functions coerce regions to Lisp strings and vice versa. Within
the string, lines are delimited by newline characters.
`line-to-region` line [Function]
This function returns a region containing all the characters on
line. The first mark is `:right-inserting` and the last is
`:left-inserting`.
`region-start` region [Function][[BR]]
`region-end` region [Function]
This function returns the start or end mark of region.
`region-bounds` region [Function]
This function returns as multiple-values the starting and ending marks
of region.
`set-region-bounds` region start end [Function]
This function sets the start and end of region to start and end. It is
an error for start to be after or in a different buffer from end.
`count-lines` region [Function]
This function returns the number of lines in the region, first and
last lines inclusive. A newline is associated with the line it
follows, thus a region containing some number of non-newline
characters followed by one newline is one line, but if a newline were
added at the beginning, it would be two lines.
`count-characters` region [Function]
This function returns the number of characters in a given region. This
counts line breaks as one character.
[HemlockProgrammer Back to Table of Contents]