Re: [saxon] Set path to look up xsl:include

Michael Kay wrote:
>If you're generating the main XSLT document, why not give it an
xml:base
>attribute on the xsl:stylesheet element pointing to the current
>directory? Alternatively, why not generate absolute URIs for its
>xsl:include and xsl:import directives?
That's what I am currently doing. I go through the XSLT and munge all
the
xsl:include directives. However, it would be cleaner to run the test
suite
using the original copy of the file rather than a munged version.
Previously I was using the xsltproc command from libxslt. That searches
the
current directory for xsl:include, even if the filename passed is
somewhere
else. To me, that seems the right behaviour, since it is merely a
filename
and not a URI. (If the XSLT input were specified as
file:///something.xslt
then it would be right to resolve relative to that base URI, but by
accepting
a naked filename you've already diverged from a strict URI-based
lookup.)
It also accepts a --path option. These two features make it work better
as
a command line tool.
If I made a patch for Saxon to use the current directory for file-based
lookups
(but not when a full URI is given), would you accept it? That might
change the
behaviour of some existing scripts. So perhaps better to add the --path
option.
What do you think?
--
Ed Avis <eda@...>
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

Thread view

Hi, I'm running Saxon-HE 9.2.0.3J from the command line like this:
% java -jar /usr/share/java/saxon.jar -s:- -xsl:MYXSLT <MYXML
In other words, the XML input is read from standard input and the XSLT file is
specified with the -xsl: argument.
My XSLT file has some include directives such as
<xsl:include href="common.xslt" />
I would like to set the directory used for these. At the moment Saxon uses the
base directory of the XSLT file, so that if I say -xsl:/foo/bar.xslt then it will
look for /foo/common.xslt when resolving the xsl:include.
How can I tell it to use some other directory such as the current directory?
I can see global/defaultCollection in the configuration file but I don't know if
that is the right thing. Ideally, I would pass the path to search on the command
line.
--
Ed Avis <eda@...>

On 13/05/2011 12:20, Ed Avis wrote:
> Hi, I'm running Saxon-HE 9.2.0.3J from the command line like this:
>
> % java -jar /usr/share/java/saxon.jar -s:- -xsl:MYXSLT<MYXML
>
> In other words, the XML input is read from standard input and the XSLT file is
> specified with the -xsl: argument.
>
> My XSLT file has some include directives such as
>
> <xsl:include href="common.xslt" />
>
> I would like to set the directory used for these. At the moment Saxon uses the
> base directory of the XSLT file, so that if I say -xsl:/foo/bar.xslt then it will
> look for /foo/common.xslt when resolving the xsl:include.
It's defined in the specs that relative URIs are resolved against the
base URI of the document that contains them. So one solution would be to
add the attribute xml:base="file:///c:/my/lib"; in the MYXSLT document to
change the base URI to be that of the directory containing the included
stylesheets.
If that doesn't work for you, the alternative is to write your own
URIResolver, which can use any algorithm it wants to resolve the
relative URIs. Or you could use the "catalog resolver" which finds looks
up referenced URIs via a catalog file.
> How can I tell it to use some other directory such as the current directory?
> I can see global/defaultCollection in the configuration file but I don't know if
> that is the right thing.
No, that's all about the default-collection() function which is quite
unrelated to your problem.
> Ideally, I would pass the path to search on the command
> line.
>
A nice idea in theory but unfortunately URIs don't work this way.
Michael Kay
Saxonica

Thanks for your answer. If I understand rightly, the XSLT file is
specified as a URI
not a filename, even when mentioned on the command line. Because it's a
URI, to conform
with the spec, relative URIs such as those in my xsl:include directives
must be resolved
relative to that base.
My use case is that I want to run the command line from a script. I
would like to
programmatically generate both the input XML and the XSLT, although the
XSLT may contain
include directives. (In fact, it's intended as a test suite which gets
different versions
of the XSLT from the version control system and checks the result.)
For the input XML there is no problem - it can be passed on standard
input. For the XSLT
I had hoped to write a file in /tmp and use that as input - but doing so
changes the path
used for relative URI lookups. I would prefer not to write a file in
the current directory,
since it may not be writable.
My next question is whether it's possible to pass the XSLT on the
command line so it can be
piped in. It appears that the '-' syntax for standard input, while
recognized for the XML
argument, doesn't work as an -xsl:- option.
Or, ultimately, to divorce the 'URI' of the XSLT file, which affects the
semantics used to
process it, from the filename that the command line tool should read
from to get the contents
of that URI. For command line use this would be handy.
--
Ed Avis <eda@...>
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

If you're generating the main XSLT document, why not give it an xml:base
attribute on the xsl:stylesheet element pointing to the current
directory? Alternatively, why not generate absolute URIs for its
xsl:include and xsl:import directives?
Michael Kay
Saxonica
On 16/05/2011 07:41, Ed Avis wrote:
> Thanks for your answer. If I understand rightly, the XSLT file is
> specified as a URI
> not a filename, even when mentioned on the command line. Because it's a
> URI, to conform
> with the spec, relative URIs such as those in my xsl:include directives
> must be resolved
> relative to that base.
>
> My use case is that I want to run the command line from a script. I
> would like to
> programmatically generate both the input XML and the XSLT, although the
> XSLT may contain
> include directives. (In fact, it's intended as a test suite which gets
> different versions
> of the XSLT from the version control system and checks the result.)
>
> For the input XML there is no problem - it can be passed on standard
> input. For the XSLT
> I had hoped to write a file in /tmp and use that as input - but doing so
> changes the path
> used for relative URI lookups. I would prefer not to write a file in
> the current directory,
> since it may not be writable.
>
> My next question is whether it's possible to pass the XSLT on the
> command line so it can be
> piped in. It appears that the '-' syntax for standard input, while
> recognized for the XML
> argument, doesn't work as an -xsl:- option.
>
> Or, ultimately, to divorce the 'URI' of the XSLT file, which affects the
> semantics used to
> process it, from the filename that the command line tool should read
> from to get the contents
> of that URI. For command line use this would be handy.
>

Michael Kay wrote:
>If you're generating the main XSLT document, why not give it an
xml:base
>attribute on the xsl:stylesheet element pointing to the current
>directory? Alternatively, why not generate absolute URIs for its
>xsl:include and xsl:import directives?
That's what I am currently doing. I go through the XSLT and munge all
the
xsl:include directives. However, it would be cleaner to run the test
suite
using the original copy of the file rather than a munged version.
Previously I was using the xsltproc command from libxslt. That searches
the
current directory for xsl:include, even if the filename passed is
somewhere
else. To me, that seems the right behaviour, since it is merely a
filename
and not a URI. (If the XSLT input were specified as
file:///something.xslt
then it would be right to resolve relative to that base URI, but by
accepting
a naked filename you've already diverged from a strict URI-based
lookup.)
It also accepts a --path option. These two features make it work better
as
a command line tool.
If I made a patch for Saxon to use the current directory for file-based
lookups
(but not when a full URI is given), would you accept it? That might
change the
behaviour of some existing scripts. So perhaps better to add the --path
option.
What do you think?
--
Ed Avis <eda@...>
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

> Previously I was using the xsltproc command from libxslt. That searches
> the current directory for xsl:include, even if the filename passed is
> somewhere else. To me, that seems the right behaviour, since it is merely a
> filename and not a URI.
The XSLT specification is explicit that relative URIs in the href
attribute of xsl:include and xsl:import are resolved relative to the
base URI of the stylesheet node that contains the reference. This
behaviour sounds non-conformant to me. There's latitude for how the base
URI of the main module is established, but there's no provision for any
kind of search.
> (If the XSLT input were specified as file:///something.xslt then it
would be right to resolve relative to that base URI, but by accepting a
naked filename you've already diverged from a strict URI-based lookup.)
I don't accept that. The command line uses an operating system filename
to identify a resource. That resource has a URI, which is used as its
base URI.
> It also accepts a --path option. These two features make it work better
> as a command line tool.
It might well be a usable design, but if it's not conformant, that's a
show-stopper. (The XSLT specification gives freedom to implementations
on how an absolute URI is dereferenced, but not on how a relative URI is
converted to an absolute URI. Unfortunately the JAXP URIResolver design
doesn't reflect this, as it appears to give the URIResolver license to
decide how the relative URI is resolved. If the URIResolver doesn't
follow the rules, nasty things can happen.)
> If I made a patch for Saxon to use the current directory for file-based
> lookups
> (but not when a full URI is given), would you accept it?
No.
What I would accept is the ability to specify href="classpath:name.xml"
as an absolute URI using the classpath scheme, with name.xml being found
by a search of the classpath.
Michael Kay
Saxonica

Michael Kay wrote:
>>(If the XSLT input were specified as file:///something.xslt then it
>>would be right to resolve relative to that base URI, but by accepting
a
>>naked filename you've already diverged from a strict URI-based
lookup.)
>
>I don't accept that. The command line uses an operating system filename
>to identify a resource. That resource has a URI, which is used as its
>base URI.
Ah, I see. I thought that taking a filename was just a convenience
although
not strictly conformant. But as you see it, a filename can be converted
to
a file:/// URI, which is then an absolute path, and also has the effect
of
specifying the base to resolve from.
It's a pity that the file: scheme, to the extent it is standardized at
all,
does not allow you to specify the file to read from independently from
the
'base' to search.
Many web browsers nowadays accept a data: URI scheme. Would you
consider
adding such support to Saxon?
>(The XSLT specification gives freedom to implementations
>on how an absolute URI is dereferenced, but not on how a relative URI
is
>converted to an absolute URI.
Do you mean that a naked filename counts as a relative URI, and so is
covered by the specification?
Finally, Saxon accepts the syntax - to read from standard input for the
XML.
It would be useful to have that for the XSLT too. Would you consider
that,
or some other means to specify that XSLT should be read from stdin?
--
Ed Avis <eda@...>
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

> Ah, I see. I thought that taking a filename was just a convenience
> although not strictly conformant.
The command line interface is an API, and implementors have complete
freedom over API design. There's no similar freedom (in theory at least)
in what can go in stylesheet source - for example the contents of
xsl:include/@href must be a relative or absolute URI. Allowing a Windows
filename is technically non-conformant, though of course many processors
allow it. Saxon does in fact stretch the rules here, for example by
popular demand it allows the "jar" URI scheme to be used, even though
these (Sun-defined) "URIs" don't satisfy the RFC syntax for valid URIs.
> It's a pity that the file: scheme, to the extent it is standardized at
> all, does not allow you to specify the file to read from independently from
> the 'base' to search.
I think the whole URI mechanism is pretty flakey, to be honest. There
are certainly many cases in which one would like a relative URI
reference to be resolved by searching a path rather than by resolving
against an explicit base.
>
> Many web browsers nowadays accept a data: URI scheme. Would you
> consider
> adding such support to Saxon?
>
I would have added it already if it were easy. I couldn't find an
off-the-shelf library to enable this, or a test suite to test it.
>
> Do you mean that a naked filename counts as a relative URI, and so is
> covered by the specification?
Not really. If you specify href="lib.xsl", that's isn't a "naked
filename", and it doesn't "count as" a relative URI: it is a relative
URI reference by definition.
> Finally, Saxon accepts the syntax - to read from standard input for the
> XML.
> It would be useful to have that for the XSLT too. Would you consider
> that,le
> or some other means to specify that XSLT should be read from stdin?
I think that might have worked at some stage - not sure why it doesn't
now. I'm not sure many people use this style of scripting nowadays, though.
Michael Kay
Saxonica

Michael Kay <mike@...> writes:
>>Do you mean that a naked filename counts as a relative URI, and so is
>>covered by the specification?
>
>Not really. If you specify href="lib.xsl", that's isn't a "naked
>filename",
Sorry, I was referring to a filename passed on the command line, rather than
mentioned in an XML document.
If you take the position that -xsl:/somewhere/foo.xslt specifies a URI, then
clearly the standard requires that references be looked up relative to the
base of that URI.
If, on the other hand, the argument is taken as part of the command line API
of the tool, and the filename given is merely a filename, that gives more
latitude to implement different behaviours without breaking the spec.
>>Finally, Saxon accepts the syntax - to read from standard input for the
>>XML.
>I think that might have worked at some stage - not sure why it doesn't
>now.
OK. In my personal view it would be handy to have either -xsl:- for standard
input or -xsl:data:abcde for passing a literal string, so that the command line
tool can be used programmatically without creating temporary files.
I suppose that implicitly, either - or data:abcde would set the current
directory as the base URI for later lookups, even though technically neither of
them is a URI or even a filename and so they don't have the notion of a base
directory.
--
Ed Avis <eda@...>