The need for performance metrics and comparison

Since we released AspectWerkz 1.0, and more generally for every release of any AOP / interceptor framework (AspectWerkz, AspectJ, JBoss AOP, Spring AOP, cglib, dynaop etc), a question is always raised: "what is the performance cost of such an approach?", "how much do I loose per method invocation when an advice / interceptor is applied?".

This is indeed an issue that needs to be carefully addressed, and that in fact has affected the design of every mature enough framework.

We are probably all scared by the cost of the java.lang.reflect despite its relative power, and usually, even before starting to evaluate semantics robustness and ease of use in general - we start doing some Hello World bench.

We have started AWbench for that purpose. Offering a single place to measure the relative performance of AOP/Interceptor frameworks, and measure it by your own.

More than providing performance comparison, AWbench is a good place to figure out the semantic differences and ease of use of each framework by using them for the same rather simple purpose. A "line of count" metrics will be provided in a next report.

Current performance results

This table provides the figures from a bench in "nanosecond per advised method invocation". A single method invocation is roughly about 5 ns/iteration on the bench hardware/software that was used. Note that an advised application will have more behavior than just a non advised method so you should not compare non advised version to advised version. AWbench does not provide yet metrics for a hand written implementation of the AOP concepts.

The results were obtained with 2 million iterations.

In this table, the two first lines in bold are the most important ones. In a real world application, it is likely that the before or around advice will interact with the code it is advising and to be able to do that it needs to access runtime information (contextual information) like method parameters values and target instance. It is also likely that the join point is advised by more than one advice.

On the opposite it is very unlikely to have just a before advice that does nothing, but it gives us a good evaluation on the most minimal overhead we can expect.

Note: comparing such results when the difference is small (f.e. 15 ns vs 10 ns) might not be relevant. Before doing so you should run the bench several time and compute an average after removing the smallest and highest measurements.

AWBench (ns/invocation)

aspectwerkz

awproxy

aspectwerkz_1_0

aspectj

jboss

spring

dynaop

cglib

ext:aopalliance

ext:spring

ext:aspectj

before, args() target()

10

25

606

10

220

355

390

145

-

220

-

around x 2, args() target()

80

85

651

50

290

436

455

155

465

476

-

before

15

20

520

15

145

275

320

70

-

40

10

before, static info access

30

30

501

25

175

275

330

70

-

35

-

before, rtti info access

50

55

535

50

175

275

335

75

-

35

-

after returning

10

20

541

10

135

285

315

85

-

45

15

after throwing

3540

3870

6103

3009

5032

-

6709

8127

-

-

3460

before + after

20

30

511

20

160

445

345

80

-

35

20

before, args() primitives

10

20

555

10

195

350

375

145

-

210

-

before, args() objects

5

25

546

10

185

325

345

115

-

200

-

around

60

95

470

10

-

225

315

75

-

-

90

around, rtti info access

70

70

520

50

140

250

340

80

70

70

-

around, static info access

80

90

486

25

135

245

330

75

80

80

-

This table provides the figures from the same bench where for each category AspectWerkz 2.0.RC2-snapshot is the reference.
The first line illustrates that for the most simple before advice, AspectWerkz is 13 times faster than JBoss AOP 1.0.

Some figures are not available when the underlying framework does not allow the feature. For the ext: ones, that can be due to pending work (AOP alliance interfaces can emulate a before advice just as it is the case in JBoss AOP).

after throwing advice appears to be slow since it first, is an overhead in throwing the exception (user code) and second, in catching the exception and do an instanceof to check the exception type (advice code).

latest run: Dec 20, 2004, as per Spring Framework team feedback.

AWbench internals

Summary

AWbench is a micro benchmark suite, which aims at staying simple. The test application is very simple, and AWbench is mainly the glue around the test application that applies one or more very simple advice / interceptor of the framework of your choice.

AWbench comes with an Ant script that allows you to run it on you own box, and provide some improvement if you know some for a particular framework.

What is the scope for the benchmark?

So far, AWbench includes method execution pointcuts, since call side pointcuts are not supported by proxy based framework (Spring AOP, cglib, dynaop etc).

The awbench.method.Execution class is the test application, and contains one method per construct to bench. An important fact is that bytecode based AOP may provide much better performance for before advice and after advice, as well as much better performance when it comes to accessing contextual information.
Indeed, proxy based frameworks are very likely to use reflection to give the user access to intercepted method parameters at runtime from within an advice, while bytecode based AOP may use more advanced constructs to provide access at the speed of a statically compiled access.

11 Comments

Jonas, interesting stuff. I've been compiling my own benchmarks a while ago and didn't get the exact same results, but I'll look into that later.

Although I don't think the small invocation overhead in Spring applications is relevant (oftentimes, Spring AOP is used to do transaction management, security, etctera--all resource intensive stuff, generating much more overhead than the Spring AOP stuff), you can make it about 10 to 15% faster when including the following as properties for the PFB:

Optization differs per proxy type with Spring. When using CGLib for example, you won't be able to change the advice configuration from an already created proxy. In 1.3, the option to freeze the configuration will also be added, giving an even bigger performance increase. Setting the proxy to opaque will disable the feature that allows you to cast the proxy to Advised (inspecting the proxy's advisors, etcetera). The latter won't increase performance a lot, the former will however!

Also, what about the different deployment models available with JBoss for example. I assume you've been precompiling the aspects for JBoss (performance will degrade with a factor 2 if you do online weaving, at least, using my benchmark ).

Just to add to what Alef has written, the default configuration in Spring is to allow for advice to added/removed on the fly and to use JDK proxies rather than CGLIB. In situations where you don't want to change the advice chain then you can 'freeze' the proxy once it is created. When using this in conjunction with the CGLIB proxies in Spring you get quite a bit of a performance enhancement. Once I have downloaded the code, I will submit any optimiziations for Spring if I can make them.

The table here is not the output order of the "ant run:all". It was more interesting to have more realistic schemes like "before with context exposure" and "2 around advice with context exposure" appear first and in
bold.
The "157" you obtain for "ext:spring" (Spring aspects within the AspectWerkz runtime) with the label "before advice" match the "before" line in the table for "ext:spring" which is "40" (ns/iteration) in the table due to hardware differences / VM differences etc.
I attach to the page the full log of the bench that lead to this table (output of "ant run:all")

Anonymous

We have been using AspectWerkz and JBossAOP and found very slow performance in both implementations. JBoss was executing 2000/sec and AWerkz up to 19,000/sec. We have finally go to the bottom of it and found that its the scope definition slowing everything down. When the scope set is "perJVM" performance is around 2,000,000/sec and when "perInstance" we get 19,000/sec. This is a fairly significant difference and our experience with CGLib or proxy based implementation have show better performance than those implementations. Do you have any suggestions on how to improve performance?
Regards neil.