Since the indexing dependency resolver I have been working on does not yield the performance improvements I was hoping for, I am now benchmarking the dependency project in isolation. It makes profiling a lot easier since there is a lot less other things going on which is the case when starting up the application server. My thinking now is that we should do the same for the other MC modules.

One thing is to not try to resolve contexts that are in the INSTALLED state (or where the toState is invalid) from AbstractController.resolveContexts(boolean), this shaves off about 20% for contexts installed in the right order. See http://community.jboss.org/message/524747#524747 for a description of the benchmark.

We do quite a lot of adding to and removing from AbstractController.installed and the sets stored in AbstractController.contextsByState. They are currently implemented as CopyOnWriteArraySets which perform badly when there are large numbers of entries in the sets already. I tried replacing these with ConcurrentSet (based on ConcurrentHashMap), but that performed worse. I've spoken to Jason who gave me a few ideas, but will put those off until I've picked off a few other things.

AbstractController.resolveCallbacks() was spending a lot of time setting/unsetting the thread context classloader, so I have changed that to avoid doing that unless there actually are some callbacks to install

These calls are probably heavier than they would normally be in the AS since the tests are running with a security manager enabled meaning we need to create the PrivilegedAction and go via AccessController.doPrivileged although calling Thread.get/setContextClassloader() seems to have some overhead of its own especially when it was being done once each time a context enters a state.

We do quite a lot of adding to and removing from AbstractController.installed and the sets stored in AbstractController.contextsByState. They are currently implemented as CopyOnWriteArraySets which perform badly when there are large numbers of entries in the sets already. I tried replacing these with ConcurrentSet (based on ConcurrentHashMap), but that performed worse. I've spoken to Jason who gave me a few ideas, but will put those off until I've picked off a few other things.

What I forgot to mention here is that the perfomance of these sets is related to the size. COWAS performs better than ConcurrentSet when a small number of exisiting entries are present, which probably makes it ok for the 'installing' set since that will not contain that many entries. It does however degrade its performance rapidly when there are many entries there, so it is probably not the best choice for the sets stored in 'contextsByState'.

The times in ms for various amounts of prefill, COWAS, normal HashSet, ConcurrentSkipListSet and ConcurrentSet:

prefill 0

------------

COWAS: 4648 4397 4618 4430 4460

CS: 5141 5026 5177 4908 5139

HS: 4267 3954 3762 4070 3952

CSLS: 5475 5504 5461 5649 5406

prefill 10

----------

COWAS: 6781 7026 6502 6505

CS: 5009 4954 4791 5168

HS: 4165 4403 4206

CSLS: 6046 6291 6216

prefill 100

-----------

COWAS: 29138 31600

CS: 5115 5208 5179 5511

HS: 4378 4525 4098 4181

CSLS: 6836 7029 6721

prefill 1000

------------

COWAS: About 5 minutes

CS: 5030 5212

CSLS: 8488 7627 7797

HS: 4141 4085

So this might explain why CS makes performance worse when used for contextsByState's sets, in my benchmark most states will have very few entries and only a few willl have lots of entries. CSLS always seems to be slower than CS so we won't go with that.

A few of Jason's suggestions were;

- Create own COWAS implementation based on FastCopyHashMap

- Try a normal synchronized HashSet (need to make sure first that the way we use it we don't iterate over the entries)

- ConcurrentSkipListMap although its performance degrades with performance (evaluated above)

I created a FastCopyOnWriteSet implementation, unfortunately its performance degrades a lot with performance as well:

prefill 0: 7634ms

prefill 10: 9127ms

prefill 100: 27898ms

I have attached the source, but I guess it is the copying upon modification that is the main overhead of the FCOWS and COWAS? Putting this into the contextsByState I see a slight worsening in performance.

I expect a fair bit, before I started AbstractDependencyInfo.resolve() was taking ~50% of the time being called ~600K times, 95% of this going into calling getInstalledContext() 600k times and getContext() 1200K times.

After my last commit we got rid of a third of the work by getting rid of the 600K calls to getContext.

With this fix all ADInfo.resolve() will do is call getContext() once, so effectively cutting out 2 thirds of what it was doing.

With this change 1000 contexts in the wrong order takes ~2650ms down from ~3500ms in the original AbstractDependencyItem. I'll profile dependency a bit more in isolation tomorrow to look for anything else that is obvious. After that I will start looking at kernel which should be interesting since there are more dependencies on things like MDR.

I tried indexing the dependencies by state since a lot of time is spent iterating over the dependencies to determine the unresolved dependencies, however this had an adverse effect, making it a bit slower

I tried indexing the dependencies by state since a lot of time is spent iterating over the dependencies to determine the unresolved dependencies, however this had an adverse effect, making it a bit slower

The current benchmarks only use one dependency, so it might be faster if there is more than one dependency. I won't add this for now, but will keep it in mind

Find B, B gets moved to CREATE (it is before A in the COWAS for the state)

Find A, A gets moved to CREATE

break out since we incremented contexts

next iteration:

inner loop until CREATE:

Find B, B gets moved to START, break out

Find A, A gets moved to START, break out

break out since we incremented contexts

next iteration:

inner loop until START:

Find B, B gets moved to INSTALLED, break out

Find A, A gets moved to INSTALLED, break out

break out since we incremented contexts

New model:

C goes through all the states until INSTALLED

next outer iteration:

inner loop until INSTANTIATED:

Find A, A has its dependencies resolved and gets moved to CONFIGURED

next inner loop CONFIGURED

Find A, A gets moved to CREATE

next inner loop CREATE

Find A, A gets moved to START

next inner loop START

Find A, A gets moved to INSTALLED

next outer iteration

inner loop until INSTANTIATED:

Find A, do nothing since B is not in CONFIGURED

Find A, A has its dependencies resolved and gets moved to CONFIGURED

next inner loop CONFIGURED

Find A, A gets moved to CREATE

next inner loop CREATE

Find A, A gets moved to START

next inner loop START

Find A, A gets moved to INSTALLED

I was talking about changing COWAS for contextsByState earlier, but this exercise has convinced me that it is needed due to the fact that its iterator will return contexts in the order they were added.

So in both cases C enters CONFIGURED, CREATE, START and INSTALLED before B, and similarly B enters those states before A. This is as it should be at least from the old ServiceController training notes: if a context has a dependency then the context we depend on will be created by the time we are created, and the context we depend on will be started by the time we are started.

The only way I can think of breaking this is if B had a DependencyItem{iDependOn=D whenRequired=CREATE dependentState=INSTALLED}, in which case B would hang in CONFIGURED case until D is installed, meaning A would enter CREATE, START, INSTALLED before B, but that is true in both models.