Re: [Wikipedia-l] what's being done about performance?

On Sat, 13 Jul 2002, Daniel Mayer wrote: > Much is now being done to remedy performance problems -- so I do believe what > you said is needlessly rude (even if there is a grain of truth to it). This > is an issue that has crept up upon the developers as new features were added > -- many of which were asked for by the users.

If performance issues creep upon you, then you have not designed your system for performance measurement and monitoring. This means you are clueless. It is like driving a car without a speedometer, and being all surprised when you are caught for speeding.

In the last week, my script made 410 attempts at 20 minute intervals to reach the page http://www.wikipedia.com/wiki/ChemistryOut of these, only 86% were served in less than 5 seconds. Five percent of the calls timed out (my limit is 60 seconds). Now, this is far better than the worst problems that Wikipedia saw in April or May, but it is still pretty miserable. The non-English Wikipedias feature very similar numbers.

The Sevilla project (http://enciclopedia.us.es/) serves 96% of all my attempts in under 2 seconds, and 99% in under five seconds. This should probably be attributed to luck rather than skill, but it helps move people from the Spanish Wikipedia over to the breakout project.

> Software development seems to often work a lot like article development --

That's OK, but just like the basic Wiki software defines the concept of an article (it can be written, reviewed, its history tracked, modified, removed), the software should define a framework for new functionality that can measure its impact on performance, and turn it on or off. Think modules.

Random selection from the first list will search on average 50+50+20+50+100+250+500+1000+2500+5000 / 10 = 952 pages

Random selection from the second list will search on average 50+50+20+50+100+250+500 / 10 = 102 pages

a reduction in load of almost an order of magnitude.

Removing these big outlier loads may well take some of the strain off ordinary page loads that happen to occur at the same time.

------------------------------------------------

SUGGESTION #2:

The 'Unsuccessful search' pages can be enormous. They accumulate all the bad searches in a whole month. As Wikipedia becomes more popular, they have become huge, and they now take a long time to load. We should make these weekly or daily instead of monthly, and perhaps split up the old ones using a script.

This will also have the effect of improving the 'most wanted' rating of frequently missed searches, as currently only one instance a month counts.

Or perhaps they should be generated as a special page from the database?