Hi Davide and Santiago,
Welcome to the list. The MARKOS/Crawler sounds interesting.
To integrate it into Allura, you would most likely want to make it a separate tool. Allura
is pluggable - it's capabilities can be augmented by external tools. Our documentation on
creating a tool from scratch is not great (maybe not existent yet?), but for a simple example
see http://sourceforge.net/p/forgepastebin.
This is a tool that is separate from the Allura codebase, but which we use in the SourceForge
instance of Allura. It's simple enough that you can probably see how to write your own tool
just by reading through the code.
If you have any questions as you go, don't hesitate to ask here!
--
Tim Van Steenburgh
On Tuesday, January 15, 2013 at 3:43 PM, Santiago Lizardo wrote:
> This is my first message to the list. Nice to meet you all. (I've joined
> this list because I'm a 10 old Sourceforge user and I'm looking forward
> to contribute on the development of Allura)
>
> I'm an experienced Web scrapping programmer, so I think I can team with
> you Davide on the task of collecting data from other sites.
>
> I've written my own crawlers but also I'm familiar with scrapy (
> http://scrapy.org/), a Python crawling/scrapping solution very well made
> and easy to use.
>
> Please let me know if you need a hand on this.
>
> On 01/15/2013 07:45 PM, Rich Bowen wrote:
> > On Jan 15, 2013, at 12:31 PM, Davide Galletti wrote:
> >
> > > Hi everybody,
> > >
> > > my name is Davide Galletti and I am working on a EEC funded research project
named MARKOS;
> > Welcome!
> > > Within MARKOS I will realize a component called "Crawler" which will be responsilble
for
> > > gathering as much informationon OSS Projects as possible from forges, metaforges
and any
> > > source we might find interesting. The first release is expected in 2013, and
development will
> > > continue till the end of 2014, of course with an OSS license.
> > > We expect to contribute also in other directions; for instance we might consider
helping
> > > out Apache on the maintenance of the DOAP files of their projects.
> > >
> >
> > I'm very interested in this. In fact, just last week I was looking at the DOAP listing
(https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/files.xml) and
noticed that numerous projects are missing. We'd love to see that list be complete.
> >
> > > I would be happy the Crawler component could become useful to the Allura Platform;
the benefit
> > > would be that the user searching on Allura would find also projects hosted
elsewhere; within
> > > Allura there could be a detail page on the project from which the user could
eventually jump
> > > to the external project or download pages.
> > >
> > > If this makes sense, I hope that you will keep an eye on this project and maybe
also give me
> > > some hints,
> > >
> >
> >
> > We'd love to see more of your ideas in this direction. I'm more on the community
side than technical, so I'll leave it to others to give you specific technical direction,
but mostly, we'd love to see you jump in and make things happen.