Re: [Sbcl-devel] Unparsing pathnames with non-initial dots in names

Richard M Kreuter <kreuter@...> writes:
> I'm quite confused by this reasoning, if it's supposed to explain why
> namestrings are meant to unambiguously designate pathnames, and not
> ``native'' filenames. Pathnames with :wild or :wild-inferiors could
> be among those with no namestring.
In addition to what Christophe said, the #p"..." syntax is specified
to use the namestring, so if pathnames with :wild or :wild-inferiors
are to be printable...
It is a bleeding mess.
Cheers,
-- Nikodemus Schemer: "Buddha is small, clean, and serious."
Lispnik: "Buddha is big, has hairy armpits, and laughs."

Richard M Kreuter <kreuter@...> writes:
> There's something possibly buggy (and at any rate confusing) in the
> following behaviors around pathnames that have non-initial dots in
> their name components.
>
> * (namestring (make-pathname :name "foo.txt" :type :unspecific))
>
> "foo.txt"
> * (namestring (make-pathname :name "foo.txt" :type nil))
>
> debugger invoked on a SIMPLE-ERROR:
> too many dots in the name: #<PATHNAME (with no namestring)
> :HOST #<SB-IMPL::UNIX-HOST {8028C=
81}>
> :DEVICE NIL
> :DIRECTORY NIL
> :NAME "foo.txt"
> :TYPE NIL
> :VERSION NIL>
That's a little bit odd, certainly, and probably wrong. (But the one
that's probably wrong is the first one, not the second.
> The same behavior is expressed by enough-namestring.
>
> I think this disagreement may violate section 19.2.2.2.3.1 of the
> Hyperspec:
>
> | If a pathname is converted to a namestring, the symbols =E2=80=98nil=E2=
=80=99 and
> | :unspecific cause the field to be treated as if it were empty. That
> | is, both =E2=80=98nil=E2=80=99 and :unspecific cause the component not =
to appear in
> | the namestring.
Yes, it probably does.
> However, I don't understand why unparse-unix-file does what it does in
> this case.
Well, the first thing to realise is that namestrings don't really
correspond to a Unix file name; instead, they are a convenient (ha!)
way of designating a Lisp pathname. There are invariants for
namestrings that needs to be satisfied, namely
(string=3D string (namestring (parse-namestring string)))
and
(equal pathname (parse-namestring (namestring pathname)))
and so when pathnames cannot be represented by string in a non-lossy
way, you get the error that you saw.
> To confuse things even further, OPEN works in both cases:
>
> * (with-open-file (f (make-pathname :name "foo.txt" :type nil)
> :direction :output :if-exists :supersede)
> (write-line "some text" f))
Right, because the way that a pathname is converted to a Unix filename
is different from the way in which it's converted into a namestring.
Does that help? (This isn't a complete answer, but maybe it provides
a few clues.)
Cheers,
Christophe

Christophe Rhodes writes:
> Well, the first thing to realise is that namestrings don't really
> correspond to a Unix file name...
Groan. I knew that this was the case for logical pathname namestrings,
but I'd thought that the standard meant for other namestrings to be
usable as filenames according to the file system's conventions. I guess
I'll have to read chapter 19 more closely.
> ... instead, they are a convenient (ha!) way of designating a Lisp
> pathname. There are invariants for namestrings that need to be
> satisfied, namely
> (string= string (namestring (parse-namestring string)))
> and
> (equal pathname (parse-namestring (namestring pathname)))
> and so when pathnames cannot be represented by string in a non-lossy
> way, you get the error that you saw.
For the moment, I believe the following invariant(?) is also violated by
SBCL's current behavior:
| enough-namestring returns an abbreviated namestring that is just
| sufficient to identify the file named by PATHNAME when considered
| relative to the DEFAULTS. It is required that
|
| (merge-pathnames (enough-namestring pathname defaults) defaults)
| == (merge-pathnames (parse-namestring pathname nil defaults) defaults)
|
| in all cases, and the result of enough-namestring is the shortest
| reasonable string that will satisfy this criterion.
* (let ((defaults (make-pathname :type "baz")))
(merge-pathnames
(parse-namestring (make-pathname :name "foo.bar" :type nil)
nil defaults)
defaults))
#P"foo.bar.baz"
But
* (let ((defaults (make-pathname :type "baz")))
(merge-pathnames
(enough-namestring (make-pathname :name "foo.bar") defaults)
defaults))
too many dots in the name: #<PATHNAME (with no namestring)
:HOST #<SB-IMPL::UNIX-HOST {8028C81}>
:DEVICE NIL
:DIRECTORY NIL
:NAME "foo.bar"
:TYPE NIL
:VERSION NIL>
[Condition of type SIMPLE-ERROR]
> Right, because the way that a pathname is converted to a Unix filename
> is different from the way in which it's converted into a namestring.
In that case, what is the suggested way of making a Unix filename out of
a pathname, for example, to hand off to a Unix command?
(The pathname in question is constructed by asdf, which doesn't ensure
that component-pathnames have namestrings that are usable as file system
filenames, and which sets the pathname-type of a static-file component
to nil. Since vanilla asdf doesn't do anything with static-files,
nothing in asdf itself gets caught on this, but asdf extensions may do
so, e.g., to print files in a system via Unix commands.)
> Does that help?
Yes. Thanks as always.
--
RmK

Richard M Kreuter <kreuter@...> writes:
> Christophe Rhodes writes:
>
>> Well, the first thing to realise is that namestrings don't really
>> correspond to a Unix file name...
>
> Groan. I knew that this was the case for logical pathname namestrings,
> but I'd thought that the standard meant for other namestrings to be
> usable as filenames according to the file system's conventions. I guess
> I'll have to read chapter 19 more closely.
Well, consider :wild. If that's unparsed as "*", then how do you
unparse a literal *? Probably as "\\*", but now you are incompatible
with the file system. (You're compatible with the shell, of course,
in this case, but then consider :wild-inferiors...)
>> Right, because the way that a pathname is converted to a Unix filename
>> is different from the way in which it's converted into a namestring.
>
> In that case, what is the suggested way of making a Unix filename out of
> a pathname, for example, to hand off to a Unix command?
In SBCL, you can use NATIVE-NAMESTRING and PARSE-NATIVE-NAMESTRING,
but I think when I wrote those I expected them to have the same kind
of round-trip behaviour that NAMESTRING/PARSE-NAMESTRING are meant to.
In general, for portable code, I don't know of a way.
> (The pathname in question is constructed by asdf, which doesn't ensure
> that component-pathnames have namestrings that are usable as file system
> filenames, and which sets the pathname-type of a static-file component
> to nil. Since vanilla asdf doesn't do anything with static-files,
> nothing in asdf itself gets caught on this, but asdf extensions may do
> so, e.g., to print files in a system via Unix commands.)
Mm, that's a good point. I don't really know how to solve this, I'm
afraid.
Cheers,
Christophe

Richard M Kreuter <kreuter@...> writes:
> In that case, what is the suggested way of making a Unix filename out of
> a pathname, for example, to hand off to a Unix command?
SB-EXT:NATIVE-NAMESTRING
(And SB-EXT:PARSE-NATIVE-NAMESTRING to make a pathname out of something
handed to you by the OS.)
Cheers,
-- Nikodemus Schemer: "Buddha is small, clean, and serious."
Lispnik: "Buddha is big, has hairy armpits, and laughs."

Christophe Rhodes <csr21@...> writes:
> Richard M Kreuter <kreuter@...> writes:
>> Christophe Rhodes writes:
>>
>>> Well, the first thing to realise is that namestrings don't really
>>> correspond to a Unix file name...
>>
>> I'd thought that the standard meant for other namestrings to be
>> usable as filenames according to the file system's conventions. I
>> guess I'll have to read chapter 19 more closely.
>
> Well, consider :wild. If that's unparsed as "*", then how do you
> unparse a literal *? Probably as "\\*", but now you are
> incompatible with the file system. (You're compatible with the
> shell, of course, in this case, but then consider
> :wild-inferiors...)
I'm quite confused by this reasoning, if it's supposed to explain why
namestrings are meant to unambiguously designate pathnames, and not
``native'' filenames. Pathnames with :wild or :wild-inferiors could
be among those with no namestring.
In fact, if the pathnames supported by a Lisp implementation on Unix
were constrained by what can be expressed in POSIX glob syntax, then
the statement
| For example, Unix does not support :wild-inferiors in most
| implementations.
in 19.2.2.4.3 would begin to make sense.
--
RmK

Richard M Kreuter <kreuter@...> writes:
> Christophe Rhodes <csr21@...> writes:
>> Well, consider :wild. If that's unparsed as "*", then how do you
>> unparse a literal *? Probably as "\\*", but now you are
>> incompatible with the file system. (You're compatible with the
>> shell, of course, in this case, but then consider
>> :wild-inferiors...)
>
> I'm quite confused by this reasoning, if it's supposed to explain why
> namestrings are meant to unambiguously designate pathnames, and not
> ``native'' filenames. Pathnames with :wild or :wild-inferiors could
> be among those with no namestring.
They could, but that would make certain other people unhappy. (More
than with other areas of CL, there is a vast amount of historical
baggage in pathnames; people expect things like
(setf (logical-pathname-translations "FOO")
("**;*.*.*" "/home/foo/**/*.*"))
to work, for instance).
> In fact, if the pathnames supported by a Lisp implementation on Unix
> were constrained by what can be expressed in POSIX glob syntax, then
> the statement
>
> | For example, Unix does not support :wild-inferiors in most
> | implementations.
>
> in 19.2.2.4.3 would begin to make sense.
Well, sure, but POSIX glob syntax isn't the same as Unix filesystem
syntax, so you would still have the problem that a Lisp namestring
didn't correspond in any direct way to a Unix filename.
Cheers,
Christophe

Richard M Kreuter <kreuter@...> writes:
> I'm quite confused by this reasoning, if it's supposed to explain why
> namestrings are meant to unambiguously designate pathnames, and not
> ``native'' filenames. Pathnames with :wild or :wild-inferiors could
> be among those with no namestring.
In addition to what Christophe said, the #p"..." syntax is specified
to use the namestring, so if pathnames with :wild or :wild-inferiors
are to be printable...
It is a bleeding mess.
Cheers,
-- Nikodemus Schemer: "Buddha is small, clean, and serious."
Lispnik: "Buddha is big, has hairy armpits, and laughs."

Christophe Rhodes <csr21@...> writes:
> There are invariants for namestrings that needs to be satisfied,
> namely
> (string= string (namestring (parse-namestring string)))
> and
> (equal pathname (parse-namestring (namestring pathname)))
I don't find what in the CLHS implies the second invariant, and it's
hard for me to imagine how an implementation could take advantage of
the stated permission to "quietly truncate filenames that exceed
length limitations imposed by the underlying file system, or ignore
certain pathname components for which the file system provides no
support" if pathnames had to be equal through unparsing and reparsing.
Can anybody point me in the right direction?
Thanks,
RmK

Richard M Kreuter <kreuter@...> writes:
> Christophe Rhodes <csr21@...> writes:
>
>> (equal pathname (parse-namestring (namestring pathname)))
>
> I don't find what in the CLHS implies the second invariant,
Oops. This invariant comes from equal's entry.
| Pathnames
| Two pathnames are ‘equal’ if and only if all the corresponding
| components (host, device, and so on) are equivalent. Whether
| or not uppercase and lowercase letters are considered
| equivalent in strings appearing in components is
| implementation-dependent. pathnames that are ‘equal’ should be
| functionally equivalent.
> it's hard for me to imagine how an implementation could take
> advantage of the stated permission to "quietly truncate filenames
> that exceed length limitations imposed by the underlying file
> system, or ignore certain pathname components for which the file
> system provides no support" if pathnames had to be equal through
> unparsing and reparsing.
Doesn't that wording of what equality is for pathnames combined with
the permission to truncate or ignore components imply that equivalence
for pathname components isn't #'string-equal or #'string= for strings
in components?
That is, if an implementation truncated a long pathname-name, then it
could satisfy the invariant by specifying that equivalence for
pathname names was something like
(defun pathname-name-equivalence (n1 n2)
(cond ((and (symbolp n1) (symbolp n2))
(eql n1 n2))
((and (stringp n1) (stringp n2))
(let ((trunc-n1 (subseq n1 0 *pathname-name-limit*))
(trunc-n2 (subseq n2 0 *pathname-name-limit*)))
(string= trunc-n1 trunc-n2)))
;; If no other objects can be pathname-names in this
;; implementation
(t
(error "bad things to be comparing ~S ~S" n1 n2))))
Thanks,
RmK