All,
the sitemap.xml solution works IF everybody (or most) have the
robots.txt or the sitemap.xml at the root directory. So, conceptually
speaking, it should be the way to go.

But a quick test on the LOD cloud returned 404 for many if not most
sites for both sitemap.xml and robots.txt...
Curiously, for many of those without a sitemap.xml, the
<c-name>/sparql URI format to access the SPAQL endpoint DOES
work...

So something is still missing. Either each dataspace mantainer that is
willing to provide the SPARQL endpoint also provides a (even if
minimal) sitemap.xml or voiD description, or at least follows this
convention.
This would greatly enhance the accessibility of the data, and enable
tools to automatically find them as needed...

Cheers
D

Sergio Fernández wrote:

On Sat, 2009-03-07 at 00:36 -0300, Daniel Schwabe wrote:

I could query the site for its sitemap extension (would it always be
<home url>/sitemap.xml?

Yes, you can do it in a programmatic way. But that URL (/sitemap.xml),
even it's common used, it's not mandatory, so you can't use it as a
constant. But there is one way, not so direct, but at least one that is
standard:
1) From /robots.txt you can take the Sitemap's URL ("Sitemap:" as [1]
specifies)
2) According the extension proposed by DERI [2], you can check if the
sitemap points a SPARQL enpoint looking for the
sc:sparqlEndpointLocation element.
Hope that helps.
Best,
[1] http://www.sitemaps.org/protocol.php
[2] http://sw.deri.org/2007/07/sitemapextension/