Plugin Benchmarks

Summary

Plugin authors can document the performance of their Plugins. Administrators can look at the Plugin benchmark data to see if it is in an acceptable range before they install a new Plugin. Some Plugins can slow down a system; the benchmarks are an incentive for Plugin authors to write high quality Plugins with a good performance.

Plugin Benchmark Report

How to interpret the benchmark data

This section is for administrators who want to understand the benchmark data.

The benchmarks are new, you will see more Plugins document the benchmarks over time like the CalendarPlugin. New Plugins are based on the NewPluginTemplate, which contains a Plugin Info table row that looks like this:

The percentage indicates the relative page load speed of a system where the Plugin is installed and enabled, compared to the same system without that Plugin.

The GoodStyle topic is an indicator how the Plugin performs on a short page without Plugin rendering; ideally, the system should perform close to 100%, and should not be below 95%

The FormattedSearch topic is an indicator how the Plugin performs on a page with dymanic data but without Plugin rendering; ideally, the system should perform close to 100%, and should not be below 95%

The actual Plugin topic is an indicator how the Plugin performs on a short page with Plugin rendering (the Plugin normally contains some Plugin specific rendering for testing); the system performance depends on the amount of Plugin specific rendering

How to create the benchmark data

This section is for Plugin authors who want to measure and document the performance of their Plugin.

Please use the PluginBenchmarkAddOn to measure the relative performance of your Plugin. It is all automated; simply run this on the shell:% ./pluginbenchmark MyOwnPlugin GoodStyle FormattedSearch

You are encouraged to performance tune your Plugin. There are several ways to write speedy Plugins, for example by loading modules only when required (as documented in TWikiPlugins.)

Alternatively, follow these manual steps to measure the performance:

Run tests in plain cgi mode:

In case you are using an accelerator like mod_perl or SpeedyCGI, disable it

Measure absolute page load times with enabled Plugin:

For each of the three topics (GoodStyle, FormattedSearch and your Plugin topic) measure the page load time 10 times, and take the average of the 5 fastest runs. Good utilities on Linux are time and wget (you can use TWiki's geturl instead of wget). Here is an example session: % cd /tmp% time wget http://localhost/cgi-bin/view/TWiki/GoodStyle% time wget http://localhost/cgi-bin/view/TWiki/GoodStyle etc.

Measure absolute page load times without the Plugin:

Disable your Plugin by renaming the Plugin module, e.g. from MyOwnPlugin.pm to MyOwnPlugin.pm.DISABLE

Follow the same steps as above to measure the page load time of each topic

Restore your Plugin module

Calculate the relative performance

For each topic, divide the absolute page load time without/with the Plugin; convert the ratio to an integer percent number

A TWiki.org service is a nice idea, but in this case it cannot be applied because the performance would be scewed by the network speed and latency. Automation needs to be done server side, e.g. with the PluginBenchmarkAddOn and a small patch to the TWiki core.

Measuring the performance manually is not that painful, it only takes a few minutes per Plugin.

We need more Plugin authors document the performance of the Plugins. 10 month after introduction only 8% of the 141 Plugins have benchmark data.

Appeal to all Plugin authors & maintainers: Please help in getting the benchmark data. This is also a good opportunity to improve Plugin code for better performance

But what about other parts of a plugin that may use, for instance, the afterSaveHandler.

I realise that most people only care about how long a page takes to load, but I was playing with an Immediate Notification variant that uses the standard Mailer Contrib (with modifications for a new mode - see MailerContribDev) and wanted to ensure the load of the subscriptions was not excessively delaying the save.

I have never included benchmark results in any of my plugins, for a number of reasons:

They only benchmark a small subset of the many plugin handlers.

They have to be generated manually, which takes time (yes, I know it's quite quick, but it's one more thing to have to do).

The benchmark addon doesn't have a MANIFEST, so can't be pseudo-installed.

The benchmark addon doesn't even work on any of my installs. It has a hardcoded cgi-bin path in it. However it doesn't realise this, and still reports results - for the oops page. Also, the installation advice for the addon is incorrect; it advises applying a redundant patch.

The only thing the benchmarks measure is compile and view performance.

Compile performance is below the threshold of measurability for the measurement technique used (it is lost in the noise at that resolution). View performance is affected by any common tags that the plugin publishes, but no account is taken of the impact different parameters to those tags might have. For example, the performance of a tag that generates a complex chart is highly dependent on the complexity of the data it is fed. Unless the plugin is incredibly badly written, the impact it has on rendering time is so small that natural variations in load time tend to swamp any performance measurements.

I think this sort of benchmark is dangerously misleading; there is a risk that people believe them and select plugins on the basis of the benchmark results. And if they don't use them this way, what is the value of the benchmarks?

The idea of encouraging extension authors to publish benchmarks is good; I just don't believe in the current methodology. Yes, I know I could have contributed to improving it; but I just haven't had time.

The current statistics may have a very serious negative impact. Right now, they imply a a level of service that a plugin will give - it will only slow down your TWiki by X%

The benchmark being run in some unknown person's twiki environment, with plausibly few, if any, users, and plausibly few, if any, twiki apps and topics / webs, leads to even less valid or applicable statistics.

Its one of those infernal questions - the engineer in me knows that poor statistics are worse than none, because they allow readers to misinterpret them in many ways, but I also can't think of a way to make them useful - having a table of 100 performance indicators would be more accurate, but just as useless.

Making benchmarks stick into the face raises the awareness of performance. I have seen a number of contributors who tweaked the plugin code because of this benchmark. Granted, this benchmark add-on needs some overhaul. The current benchmark measures actual performance as seen by users, not some esoteric internal measurements that do not map to an actual use case. If you do not like the current solution please create a better one, but please do not simply delete the benchmark row from the template topic!

Before I express my view let me share an old saying that I repeat again and again.

I have respect for people trying to fool others

I have some respect even for people trying to fool me

I have absolutely no respect for people trying to fool themselves.

And the current expectation that plug-in authors run a benchmark with a broken benchmark tool which hardly anyone have successfully run and maybe even run wrong.

What is it we want to benchmark in the first place? How quick a chart plugin can paint a chart? In my view it does not matter how slow a plugin takes to do something very special like a chart.

What matters is - can I install 30 plug-ins without destroying the overall performance of TWiki?

In my view the important part is how much the plug-in slows down TWiki by not running. Except if the plugin influence the normal view features. Ie. does it slowdown some of the features you have on a normal page in PatternSkin.

In the ideal world the plug-in should add no additional delay when it is not used in a topic. And in my view the benchmark should be how much it delays (in percent releative to not being installed) a standard benchmark topic. And the measurement should be done this simple way.

Disable crond

Make sure no processes that consumes CPU are running (check top)

Run ab -n 20 http://mytwiki/bin/view/TWiki/BenchMarkTopic

Repeat it a few times and pick the best number

Enable the plugin in configure

Run ab -n 20 http://mytwiki/bin/view/TWiki/BenchMarkTopic

Repeat it a few times and pick the best number

Calculate the benchmark

The TWiki/BenchMarkTopic should be a typical plain topic with a few common features on it.

TOC

Headers

Text with fixed space, bold, italic, and a lot of plain text

A simple table with a header and maybe 5 rows.

One simple text file as attachment so have the attachment table active

No searches other than what is used in left bar. Searching is too dependent on what other topics you have.

This method is simple. A proper behaving plug-in should hit 99%-100% when it is not used. And if all plug-ins are consuming an absolute minimum of resources then our customers can install 10 or 40 plugins with only little visible effect. The benchmark should be the light-weight quality stamp that when not used the plugin behaves well.

Then we can leave it to the plugin author to document a 2nd benchmark which is the plugin in use. This we can never standardize because plug-ins do so many different things. But the plug-in author can provide a good information with a one liner condition description and a number.

Here are a couple of examples of how a plugin can post its benchmarks.

Performance Benchmark

TWiki.BenchMarkTopic: 99%, One simple graph: 98%

Performance Banchmark

TWiki.BenchMarkTopic: 99%, ExamplePlugin topic with examples: 97%

This would be informative, easy to perform for everyone, easy to reproduce, accurate enough (though not very very accurate) for the purpose.

Today the situation is that hardly anyone give the numbers. And the formatted search number is totally useless for 99% of plugins unless they relate to searching.

If we continue like we have done until now we try to fool ourselves.

If we implement my proposal - we get useful information with almost no effort. Only effort would be the benchmark topic which naturally should contain information on how to perform the benchmark test. One hour job that I will gladly do.

Kenneth: Thanks for the constructive feedback! What you describe makes sense, although it is better to automate this so that people are more likely to do the benchmarks. What you describe is is actually pretty much the design goal and implementation of the existing PluginBenchmarkAddOn:

Measures non-plugin topics, comparing them with/without plugin to see the relative performance

Measures speed repeatedly and removes the slow ones

Gives a relative performance number to show the impact

Measures in the real environment, e.g. as users will see it

Automates as much as possible (simply run a script)

The add-on is used to measure two non-plugin topics (simple GoodStyle, and more extensive FormattedSearch) and the actual plugin topic (FooBarPlugin), which typically contains plugin specific variables. The script measures the performance with plugin enabled and disabled to give a relative performance number. Each run is done 10 times (or 100 times if Time::HiRes is not availabe), and only the fastest runs are taken for the benchmarks. This add-on is dated and needs to be re-packaged.

Crawford:

The add-on is designed to measure the most important performance: Topic view time of non-plugin topics.

I think it would be more constructive to help repackaging this add-on instead of taking missing manifest and hard-coded cgi-bin path as an excuse to not running the benchmarks (and to removing the benchmark from the template, preventing others to do so.)

And did you fix the PluginBenchmarkAddOn at the same time so it now works? Just adding that row back into the table achieves nothing; it just maintains the status quo. As Kenneth says If we continue like we have done until now we try to fool ourselves. And in the process lose all credibility with anyone who looks below the surface of the numbers.

I'm sorry to have to come across as negative, but when I see something broken, I don't believe in just sweeping it under the carpet.

If you insist on publishing benchmarks,

benchmarks should be part of the appraisal, and should not be on the extension topic, where their prominence is misleading,

benchmarks should be runnable (and therefore updatable) by anyone, using a simple methodology such as that describe by Kenneth,

benchmark documentation must be absolutely clear about what is being measured, otherwise we lose all credibility with users.

But I'm not going to do it, because despite what Kenneth says, I still don't believe the benchmarks have any value, even with the improved methodology.

OK, lack of feedback suggests that no-one is going to resolve this, so I am taking the initiative and re-opening Bugs:Item4218. If you want to record benchmarks, do it on the appraisal; having it on the plugin topic is misleading and uninformative.

PluginBenchmarkAddOn now has a MANIFEST and paths are read from LocalSite.cfg. But at the current state I agree that this plugin should be used by developers only to improve view performance, the numbers will scare away users (they will think "hey, each plugin eats away 1% of performance, so 30 plugins will really slow down my installation!").