Hi Orbiter, I think it would be great to help YaCy with all these data gathered by loklak. But I wonder if the new field added to YaCy index should not rather be a raw value such as the links number from social medias, rather than an already processed ranking value. Wouldn't it be more clear for users and also allow finer tuning and customization of ranking on this new field?

absolute numbers are misleading becuase the absolute number of harvested messages is very different over time. I. e. there may be 200 million messages on total for one month, and then 100 million another month. To normalize this, the best approach is to compute a likelihood to click on a specific domain if all the links are available for a random click in a specific time windows. Then different time frames can be compared.Thats the same approach google does if they compute the page rank: not the absolute number of referrences but the likelihood to reach a specific page if a user clicks randomly.

Thank you for the explanation. You are right, storing a number of links only make sense regarding to the absolute total number, which evolves each time a document is indexed. So it is more practical to store the ratio.By the way, it will surely help users to clearly document the formula used to feed this new field.

It occurs to me that there may be some benefit to considering how many followers a Twitter user has who tweets a link. It's not immediately obvious to me what the algorithm should be. The naive algorithm would be to linearly weight the Twitter links by the number of followers the poster has, but I have no reason to believe that this would actually provide optimal results.