find with XPath over file system

XML can be considered as an external representation of in-memory
tree-like structures, and the XML-related standards -- as methods
of processing such data. I'm working on a reusable XPath
implementation, and the new find is a reference implementation
for adding XPath into the C programs.

You can use the variable base which is the reference to the base
directory:

./find -xpath './/*[@ino = $base/somefile/@ino]' -ls

Going to the parent directory outside the base directory increases
the depth level, so the -maxdepth parameter may affect parent
steps.

The -maxdepth parameter affects XPath predicates. For example,
the XPath expression a[b/c] always returns nothing if max depth
is 2 because the file a/b/c is never seen.

The parameter -xdev is ignored when getting the parent directory
or the root directory (programmer's fault).

The name of a file appears twice, as the name of the element and as
the value of the attribute name. The former allows to write simple
XPaths like /bin/cat and the latter allows to use the name as
XPath step: dir[res1/@name=res2/@name].

There are some XPath issues due to the programmer's issues. They are
discussed below.

There is the Scheme programming language, there is the SXML tools
for S-expression-based XML parsing, query and conversion. There is
Guile, an embeddable Scheme interpreter. We combine all together in
one application.

The application uses XPath engine (currently txpath edition) from
the SXML library. The file system tree is exposed as the SXML tree
to the library.

The binding is "lazy", so nodes in the SXML tree are instantiated
on demand. Guile was patched to allow lazy binding.

In order to switch between the file metadata (C layer) and nodes in
the SXML tree (Scheme layer), the application maintains a map
between them.

S-expressions have no parent links, so the parent and related steps
in SXPath are quite expensive. But in the application, due to metadata,
it's very easy to find the parent node for a node in the SXML tree.
We exploit it and redefine some SXPath axes in the file
sxpath-native.scm.

There are some differences between the file system tree and the XML
tree. At first, the paths /bin and ../../../../bin may point
to the same directory, but they look different for users.
At second, we can lazily walk from the base directory to the root
and at the same time descent from the root. A special attention
should be paid to a meeting point. But I decided to ignore it
due to the first issue, so nodes from relative and absolute XPaths
are never equal.

The main issue is garbage collection. I have no experience with it,
so I don't know if GC is safe in the program. For example,
can Guile delete an SXML node even if the map has a link to it?
I'm going to investigate the issue, but not right now.

Instead of compiling from sources, you can try to use the
binary version (you still need the package with source code
for scm-files). The program requires the following shared
libraries: find.so (included), libdl.so.2, libm.so.6,
libcrypt.so.1, libguile-ltdl.so.1, libc.so.6,
libpthread.so.0, /lib/ld-linux.so.2. You might need to
set up the LD_LIBRARY_PATH to help the system
in finding the find.so.