Yves Raimond wrote:
> On Sat, Aug 2, 2008 at 5:17 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>
>> Yves Raimond wrote:
>>
>>> Hello!
>>>
>>>
>>>
>>>> I would like to suggest that publishers of new linked data spaces that
>>>> plug
>>>> into the growing LOD include the following:
>>>>
>>>> 1. cross-link information
>>>>
>>>>
>>> I would also suggest we find a better measure for interlinkage than a
>>> raw number of triples linking one dataset to another.
>>> For example, http://dbtune.org/musicbrainz/ creates its own identifier
>>> for languages (http://dbtune.org/musicbrainz/directory/language),
>>> which are owl:sameAs'ed to the corresponding languages in Lingvoj when
>>> applicable, whereas linkedmdb directly links to the Lingvoj
>>> identifiers. In the latter case, the raw number of interlinks will be
>>> higher, but could be reduced a lot by creating identifiers for
>>> language and use sameAs.
>>>
>>> The same applies for geographic locations, for example. Some datasets
>>> use foaf:based_near to link to Geonames, some others create their own
>>> identifiers, and then link to the corresponding Geonames locations
>>> through owl:sameAs. For the same dataset, this two methodologies will
>>> lead to completely different numbers.
>>>
>>> To boost the statistics of a dataset, we could simply link each person
>>> or group in them to http://dbpedia.org/class/yago/Entity100001740
>>> through rdf:type :-D
>>>
>>>
>> Amen!
>>
>> And it also means we start to expose the fact that LOD is not an "instance
>> level only" linked data space (a sad misconception).
>>
>>
>>> So I think we should agree on what we count as "interlinks" before
>>> publishing such statistics, so that we can actually use these values?
>>>
>>>
>> We should basically express linkages across instance and schema/data
>> dictionary vectors. This also helps those looking to build LOD applications.
>>
>> Of course there is more to come re. the injection of "data dictionary /
>> schema" linkage aspects of LOD, but no harm in getting our thoughts in order
>> re. "best practices" for the growing cloud :-)
>>
>>> My recommendation would be to always go for the lowest value - the one
>>> you'd obtain by creating your own identifiers and using owl:sameAs
>>> (which would be equivalent to the number of distinct external URIs
>>> mentioned in your dataset).
>>>
>>> What do you think?
>>>
>>>
>> Good Idea, so share you page as a nice example :-)
>>
>>
>
> I just gave it a shot on Jamendo, counting the results of a SELECT
> DISTINCT query, and this is indeed a bit depressing.
> http://dbtune.org/jamendo/
> For example, the Geonames interlinking drops from 3244 to 289 :-)
>
Smarts vs Size, which do you choose? I find this elating :-)
Kingsley
> Some similar statistics from Musicbrainz at
> http://dbtune.org/musicbrainz/ , which I'll publish when I get some
> time to figure out how to tweak d2r templates :-)
>
> Distinct DBpedia albums - 22426
> Distinct DBpedia artists - 39877
> Distinct MySpace artists (on http://dbtune.org/myspace/) - 14668
> Distinct DBpedia countries - 245
> Distinct Lingvoj languages - 185
>
> Cheers!
> y
>
>
>
>> Kingsley
>>
>>> Cheers!
>>> y
>>>
>>>
>>>
>>>
>>>> 2. cross-link visual derived from the LOD cloud diagram.
>>>>
>>>> The Linked Movies Database has nice examples of both [1].
>>>>
>>>> Links:
>>>>
>>>> 1. http://www.linkedmdb.org:8080/Main/Interlinking
>>>>
>>>> --
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
>>>> President & CEO OpenLink Software Web: http://www.openlinksw.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>> --
>>
>>
>> Regards,
>>
>> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
>> President & CEO OpenLink Software Web: http://www.openlinksw.com
>>
>>
>>
>>
>>
>>
>
>
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com