[OSM-dev] Google Summer of Code

Am 3. April 2012 20:02 schrieb Paul Norman <penorman at mac.com>:
> The problem with detecting when changesets are closed is that there is no
> way to determine exactly when they are closed short of an API query. You
> can fake it by assuming changesets are closed an hour after the last change
> to them and 24 hours after the first change to them.
>
Open: (http://www.openstreetmap.org/api/0.6/changeset/11187430)
<osm version="0.6" generator="OpenStreetMap server">
<changeset id="11187430" user="regedi" uid="645826" created_at="
2012-04-05T10:28:21Z" open="true" min_lat="50.0106489" min_lon="36.3515771"
max_lat="50.0112144" max_lon="36.3586195">
<tag k="created_by" v="Potlatch 2"/>
<tag k="build" v="2.3-375-g9f05171"/>
<tag k="version" v="2.3"/>
</changeset>
</osm>
Closed: (http://www.openstreetmap.org/api/0.6/changeset/11167430)
<osm version="0.6" generator="OpenStreetMap server">
<changeset id="11167430" user="bergfrei" uid="327035" created_at="
2012-03-31T15:11:30Z" closed_at="2012-03-31T15:16:55Z" open="false" min_lat
="47.9912789" min_lon="9.7206276" max_lat="48.0492344"max_lon="9.8521079">
<tag k="comment" v="Hochdorf Ausgleich Luftbildversatz"/>
<tag k="created_by" v="JOSM/1.5 (5047 de)"/>
</changeset>
</osm>
Or have I missed something?
> It is better to detect problems when they occur, not up to 24 hours after
> they’ve occurred.
>
That's correct. A good practise would be, to code it as abstract as
possible and so only parse modify/delete/create sets. The origin
(minute/hour-diff/changeset) will be ignored.
I try to take this into account in my proposal.
Thanks for all of your ideas! It's time to finish my proposal :)
Regards,
Morris
> ****
>> ** **
>> *From:* kabum [mailto:uu.kabum at gmail.com]
> *Sent:* Tuesday, April 03, 2012 2:20 AM
> *To:* Derick Rethans
> *Cc:* OpenStreetMap dev list
>> *Subject:* Re: [OSM-dev] Google Summer of Code****
>> ** **
>> Hi,****
>> ** **
>> Am 2. April 2012 22:20 schrieb Paul Norman <penorman at mac.com>:****
>> A tool that operates on the changeset level is
>https://github.com/pnorman/osm-weirdness****>> It detects changesets that have a high probability of being an import or
> mechanical edit. The detection is pretty crude but it does find a fair
> number of undocumented imports, mechanical edits, and other weirdness. If
> you point it an old state.txt file it will start in the past and work up to
> the present.****
>> ** **
>> I've a look later this day on your script.****
>> ****
>> When working with the minutely diffs there are some limitations:****
>> Limited knowledge of changesets. In practice, if you start your detection
> an hour in the past you can have a list of all open changesets, but it is
> not possible to know the tags of the changesets.****
>> No knowledge of the previous state of objects. You know where deleted
> objects were, but you can’t tell how far an object is moved or what it’s
> tags were before. To tell this you need to query a service with a full
> history DB, and handling full history files is difficult.****
>> No knowledge of way geometry if using existing nodes. Iandees’
>https://github.com/pnorman/osm-weirdness/tree/way_check solves this by
> fetching nodes in a way that aren’t also in the changeset from jxapi and it
> can then detect bad geometry (e.g. ways that trace over themselves)****
>> ****
>> If you were to code a vandalism detection tool I think it should work on
> the minutely replication diffs (
>http://wiki.openstreetmap.org/wiki/Planet.osm/diffs)****
>> ** **
>> I thought about analyse the data after the changeset is closed, but this
> diffs sounds also good. I will check this way :) Thanks!****
>> ****
>> ****
>> Am 3. April 2012 09:38 schrieb Derick Rethans <osm at derickrethans.nl>:****
>> On Mon, 2 Apr 2012, kabum wrote:
>> > Result:
> > - each changeset has a total rating -> use a treshold value to divide
> them
> > into suspicious and not suspicious****
>> Instead of just using static thresholds, I think that something like SVM
> (http://en.wikipedia.org/wiki/Support_vector_machine) might be highly
> benificial here; and it's another cool technology to play with. There is
> a cool library for this (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and
> I know there is at least an extension to use it from PHP:
>http://phpir.com/support-vector-machines-in-php****>> ** **
>> Thanks for this method ... seems to be very suitable for our use case.****
>> ** **
>> I've already some years of experience of PHP, but I wouldn't prefer it for
> this part of the project. I thought about Python (libsvm has native Python
> bindings ;)) ****
>> ** **
>> ** **
>> ** **
>>> > Some questions came up within this preparation:
> > - Is there a prefered language? Has this to be specified within the
> > proposal? (language skill has to be rated, so I would decide this during
> > the project phase)****
>> Not really any preferred language. What did you have in mind? For the
> front end I was thinking PHP, but the engine, I wouldn't know. I think
> something high performant (so C or C++) might be benificial.****
>> ** **
>>> My thoughts were that it's easy to setup and it's capable to call it easy
> from a terminal or to include it in other python scripts (i.e. web
> frontend).****
>> ** **
>> If C++ is necessary, because of it's speed, then I think I could master
> this. In the passed semester I participated in a software engineering
> partical training at university (in a team of five fellow students), where
> we have an extensive use of C++ (https://github.com/brainafk/Empire).****
>> ****
>>> > - I also would like to discuss used libraries and framework within the
> > project phase, or should I decide this also in my proposal?
> > - Should the frontend integrate in the current website (ruby on rails
> > project) or should this just be an optional feature?****
>> I think it can easily live as it's own website.****
>> ** **
>> Ok :)****
>> ****
>>> > - How detailed should be the proposal? Is it enough to formulate this
> draft?****
>> That's a tricky one, the more information you provide the better I
> think, as it shows you have thought about it :-)****
>> ** **
>> I think it grows a lot by this discussion and I try to be as detailed as
> possible. :)****
>> ** **
>> Thanks for the response :)****
>> ** **
>> Regards,****
>> Morris****
>-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20120405/09e8454b/attachment-0001.html>