Also any other bells and whistles, e.g. RSS feeds for software updates, some form of RiscPkg integration/compatibility/cooperation

Reveal my creation to the world and allow me to take my rightful place as the king of the InternetPoints 2 and 3 are easy, there's plenty of software we can borrow code/algorithms/techniques from. The big issue is point 1, getting the XML format right.

So, do we have any XML gurus in the house? Has anyone seen any noteworthy attempts by other people to develop a similar system? (So that we may assasinate them, steal their ideas, or work with them, in my order of preference)

Message #80206, posted by tribbles at 22:34, 14/9/2006, in reply to message #80187

Captain Helix

Posts: 925

I'm an XML guru (have been doing XML for a heck of a lot of time). I was part of the WAP Forum for a bit, and got a couple of attributes added to the UAProf standard.

Also got the XSL/DTD RISC OS filetype numbers allocated.

Haven't defined a format for software packages, but have done the spidering/searching side of things a lot.

I'd imagine there are some other standards out there for other platforms - it's not something I've needed to look for. If there's no XML format, then it shouldn't be too difficult to design. All you need to do really is know what you want to keep.

Open Software Description? It looks a bit basic for our requirements (we'd need stuff for descriptions of the software, author, home page, etc.), but it's a good start (even if it doesn't have RISC OS as a valid OS or ARM as a valid CPU!). It also has dependencies between packages listed.

Although their examples don't show it, you can specify a SOFTPKG instead of a CODEBASE as a dependency. That way you can use the unique name of the package, thus allowing the system to identify a download location even if the original website of the dependent package died.

If we go with the plan of having each XML file link to other XML files (just for the sake of linking to them), then you won't need a central repository. Each site that offers a search engine/RSS feed/whatever can spider its way through all the files and build its own registry.

Of course, that doesn't mean a central registry wouldn't be useful. But it would be good if the system was designed so that it didn't need one. Links to other XML files and a standard naming convention for hosting the XML files on websites (i.e. files should be called poogle.xml) should be all that's needed.

DOAP? It sounds like it was spawned from similar feelings about keeping software repositories up to date:

So many registries now exist that keeping them up to date has become a real problem. The release cycle for diligent software maintainers often involves visits to several Web sites to keep the information up to date, not to mention updating their own Web sites. However, such maintainers are few and far between, and it's not uncommon to find out-of-date information in a registry. That this data gets out of date is unsurprising when you consider the aspects that many modern software projects involve: mailing lists, IRC channels, Web sites, wikis, CVS repositories, and so on.

Use cases for project descriptions include:

Easy importing of projects into software directories

Data exchange between software directories

...

Of course this is designed for open source software, but I'm sure it can be applied to closed source as well.

Now, if only I had a program to convert something written in the RDF vocabulary description language thingy into plain english

ROR looks like a good way of packaging the DOAP/whatever up onto a website. They've already standardised the use of ror.xml for provding a listing of whatever resources your website contains. Unfortunately they don't seem to have a "software" resource type.

Gah! Why does everything have to alll be so conceptual! There's probably an RDF vocubulary for describing RDF vocabularies that describe RDF vocabularies that are described using RDF vocabularies and use confusing acronyms to confuse the reader and obfuscate the purpose of the RDF vocabulary that is being described!

I guess we could use the rdfs:seealso property to provide the list of other XML files. Or maybe a slight bastardisation of DocumentList where the document names are URL's (Unless I'm missing something where you can name a document and give a URL).

Perhaps: A ROR containing DOAPs with a DocumentList at the end of the file.

Or we could just steal all their ideas and write our own vocabulary. Isn't there some way of specifying that one property maps onto the definiton of another? So we could just use all their definitions but restructure it into what suits us?

Also any other bells and whistles, e.g. RSS feeds for software updates, some form of RiscPkg integration/compatibility/cooperation

Reveal my creation to the world and allow me to take my rightful place as the king of the InternetPoints 2 and 3 are easy, there's plenty of software we can borrow code/algorithms/techniques from. The big issue is point 1, getting the XML

* Skip ROR for now, it looks to be unnecessary* Instead have an RDF file containing DOAP and DocumentList entries* The DOAP entries would obviously describe the software* The DocumentList entries would obviously list other RDF files which the spider bot should look for* After looking at the DocumentList examples, it's clear that they can contain both a URL and a name.

When a mirror site decides to mirror your software, it can produce a DOAP entry that's near identical to the original, but add doap:download-mirror to list the mirrored download, doap:screenshots for its own screenshot page, etc.

Since the software name is meant to be unique, a search engine crawler could aggregate all the information from different mirror sites to produce one long list of mirrors for a particular piece of software (complete with indexing them by what version of the software each mirror provides)

We could also suggest that people include a DOAP XML file in the .zip file for each piece of software. This would be useful for identifying the software either while it's still on the internet (e.g. if the crawler can't decide what version the software is by looking at referring DOAP entries), or once downloaded to someone's machine (so the user or some package management thingy can identify what the software is)