Recent Activity

Yesterday

@mpopov—the graphs look good. As mentioned on IRC, percentages or some other normalization would be helpful in figuring out the best response rates among the question formats and comparing yes/no/etc. rates among answers.

By eye, it looks like "would they want to read this article" gets slightly more engagement, and "would this article be relevant" and "would you click on this page" get slightly less, but I wouldn't be surprised if they were all statistically indistinguishable. I wonder if the question format has any effect on yes/no ratios, too. There may not be enough data to tell, though.

Erik pointed out that people don't like Ian Bannen (actor in the 1970s version of Tinker Tailor Soldier Spy) very much, but if you go by a simple ratio of yes/no votes, he still comes in 3rd, which is reasonable. (Ha! I just got the survey while looking at his page. It seemed only fair to dismiss it, though I wanted to vote yes.)

I think the results are promising. In places where the wisdom of the crowd disagrees with me, I think the results are understandable. For example, yesterday beetles gets all horrible results. But the least horrible is a different John Lennon song. That is at least tangentially related—it's a bad result, but it is also the best result.

I also wonder if the timeout proportion is a useful signal, or even a lack of responses (that points to a lack of popularity for the results page, at least). Seems possible, but it's not immediately clear how to use them.

@mpopov Do you think you will be able to give us insights in the next days? Our principle investigator of namespace correlations is only available until the end of next week. So if you manage to get back to it before, it would make it much easier for us to evaluate :)

Can we use golden to collect those data that are not on dashboard and keep them in /srv/published-datasets/discovery?
For those on the dashboard but with a max_data_points limit, can we just create extra reports and remove the max_data_points limit?@mpopov Any other ideas? ;)

Wed, Aug 9

@mpopov
In the current search interface, there are four option: Content articles, multimedia, everything and advanced. When you click on advanced, you get the table of all namespaces and can choose them individually. If we understand your query correctly, you only look at searches that have profile=advanced in the url. The first three options have other profiles though. For our need, we would need to have these searches included, too.

Mon, Aug 7

Status update: JK will look into giving Chelsy and/or me some kind of access so we can take a look into it. I have previous experience with Google APIs and building R bindings to web APIs, so that will be helpful here :D

I think all the fields in the schema can be white-listed and kept indefinitely (except from EventCapsule's userAgent).
I was assuming you wanted to keep the data for longer.
Otherwise, there's no action needed, because the default behavior for new schemas is "auto-purge after 90 days".

CRAN submission policy recommends waiting like 6 months before submitting another version and the most recent version available on CRAN went up on 2017-06-14, so the version that we actually want will probably go up in like 5-6 months.

@hashar Thank you for making this ticket and emailing the R Foundation/R Development Core Team! Heh, yesterday I emailed @Ottomata & @Gehel asking if setting up our own CRAN mirror would be a reasonable thing.

It's because we changed the sampling rates on April 19th, decreasing enwiki and increasing every other wiki. Since enwiki generally has high PaulScore, we effectively lowered the overall PaulScore by decreasing enwiki's contribution.