Bruno Dumon wrote:
> Another solution would be to make a list of URL's for all these files
> and feed that to the crawler. The thing that makes this list would of
> course need to have some assumptions on how files on the filesystem or
> mapped in the URL space.
Or vice-versa.
I'm still stuck with this idea of having a LinkResolverTranformer which,
given a configuration of schemes and their respective source resolution,
would rewrite links as needed. It might be "boneheaded me", and
orthogonal/supplementary to the sitemap and what is currently put
forward, but I want to do my thinking in public.
Let me try to explain where I'm aiming at:
<warning>Steven's massive FS capabilities ahead ;-)</warning>
instance plop.xml:
<?xml version="1.0"?>
<document>
<p>This is a <link href="file:images/plop.png"/>plop</link></p>
</document>
pipeline:
<generate src="plop.xml"/>
<transform type="link" name="linkresolutionset1"/>
<transform ...
<serialize/>
and some config, perhaps using inputmodules, for that transformer:
<linkresolver>
<scheme name="file">
<match pattern="**">
<pipeline target="cocoon:/{1}"/>
</match>
</scheme>
<scheme name="javadoc">
<match pattern="**">
<static src="{context}/../ROOT/static/javadoc/{1}"/>
</match>
</scheme>
<scheme name="ldap">
<match pattern="**">
<ldapquery...
</match>
</scheme>
Most of what this transfromer does could be done using XSLT, but doing
it in code, using some hierarchical configuration à la JXPath would be
coolio.
Does this make sense at all?
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://radio.weblogs.com/0103539/
stevenn at outerthought.org stevenn at apache.org