Difference between revisions of "CDO/Tweaking Performance"

(New page: ''by Simon McDuff'' <br> ''October 9, 2008'' <br> <br> The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using...)

The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using CDO 2.0.0 (HEAD at the moment).

The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using CDO 2.0.0 (HEAD at the moment).

Line 13:

Line 8:

==Setting EMF Parameters==

==Setting EMF Parameters==

−

* The first advice for improving CDO performance concerns model definition. It does not involve CDO directly, but the fact that CDO uses models may make it seem slow. Therefore, here are a few things to consider while defining a model:

+

The first advice for improving CDO performance concerns model definition. It does not involve CDO directly, but the fact that CDO uses models may make it seem slow. Therefore, here are a few things to consider while defining a model:

* For one-to-many relationships, the Unique property should be set to “false”. Otherwise, add and set operations will fetch all objects in the list.

* For one-to-many relationships, the Unique property should be set to “false”. Otherwise, add and set operations will fetch all objects in the list.

Line 28:

Line 23:

<br>

<br>

−

==Loading Partial Collections – CDOCollectionLoadingPolicy ==

+

===Batch processing existing meta models (using xquery)===

+

If you have a xquery processor installed ([http://basex.org, basex] (BSD Licence) is a good choice), you can try the following xquery script to batch process your existing ecore files to apply the above performance hints (only resolveProxies and unique and references with upper bound set to -1 will be checked).<br>

When the oid1 object gets fetched for the first time, only the first ten CDOIDs will be loaded for every list attribute it has. This changes nothing for the ref1 list since it contains only 3 items. However, the ref2 list will contain ten items only:

When the oid1 object gets fetched for the first time, only the first ten CDOIDs will be loaded for every list attribute it has. This changes nothing for the ref1 list since it contains only 3 items. However, the ref2 list will contain ten items only:

Line 63:

Line 94:

Based on some tests, good performance can be achieved by using the following settings:

Based on some tests, good performance can be achieved by using the following settings:

The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.

The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.

<br>

<br>

+

+

== Prefetching Nested Objects Explicitely – cdoPrefetch() ==

+

+

As of CDO 3.0 the CDOObject interface supports prefetching of (the revisions for) nested objects, e.g.:

+

+

object.cdoPrefetch(CDORevision.DEPTH.INFINITE);

+

== Defining Fetch Rules Dynamically – CDOFetchAnalyzer ==

== Defining Fetch Rules Dynamically – CDOFetchAnalyzer ==

−

In many applications, rules to determine what to fetch are used to speed up applications. Basically, these rules define, for a specific context, which path to load from a root object. By doing that, only the data that needs to be loaded will be loaded. Usually, these rules are really hard to maintain: models change, applications change, ...

+

In many applications, hard coded rules are used to determine what to fetch. This is mainly to speed up applications. Basically, these rules define, for a specific context, which path to load from a root object. By doing that, only the data that needs to be loaded will be loaded. Usually, these rules are really hard to maintain: models change, applications change, ...

−

The CDOFetchAnalyzer feature can be used to define classic rules, but it does so in a dynamic fashion. It detects patterns in the way objects are accessed in a specific context and, when that context comes back, it loads the same path from different root objects.

+

The CDOFetchAnalyzer feature can be used to define rules, but it does so in a dynamic fashion. It detects patterns in the way objects are accessed in a specific context and, when that context comes back, it loads the same path from different root objects.

−

Examples will be available soon.

+

Examples will be available soon. (Contributions welcome!)

<br>

<br>

−

== Caching in CDO (TBD by Eike)==

−

* CDOObject cache

+

−

* CDORevisionCache at the client side

+

----

−

* CDORevisionCache at the server side

+

Wikis: [[CDO]] | [[Net4j]] | [[EMF]] | [[Eclipse]]

Revision as of 05:31, 19 December 2012

The purpose of this document is to provide ways of using CDO optimally. It is intended for basic and expert users of CDO. It is using CDO 2.0.0 (HEAD at the moment).

Speeding up CDO is our constant goal and task. If you have any questions or suggestions, do not hesitate to contact any member of the CDO team.

Setting EMF Parameters

The first advice for improving CDO performance concerns model definition. It does not involve CDO directly, but the fact that CDO uses models may make it seem slow. Therefore, here are a few things to consider while defining a model:

For one-to-many relationships, the Unique property should be set to “false”. Otherwise, add and set operations will fetch all objects in the list.

If it is absolutely necessary to define the Unique property to be “true”, containment or a bidirectional relation many-to-one should at least be set. That way, EMF will be able (starting from version 2.5) to accelerate insertion by looking up its inverse reference (eContainer or opposite reference) instead of crawling the list.

The Resolve Proxies property should be set to “false” as well in one-to-many relationships. Otherwise, in some cases, performance could happen to decrease. The internal structure of CDO never creates EMF proxies even when it references external data in a non-CDO resource. CDO will load them when the list is being accessed.

In any case, both properties (Unique and Resolve Proxies) should rarely be used at the same time, especially without an opposite single reference

By doing these simple things, CDO users can get a twentyfold performance improvement in their application. It is worth being tried: adding 10,000 elements in a list, with and without those changes, to see the difference.

Batch processing existing meta models (using xquery)

If you have a xquery processor installed (basex (BSD Licence) is a good choice), you can try the following xquery script to batch process your existing ecore files to apply the above performance hints (only resolveProxies and unique and references with upper bound set to -1 will be checked).

When the oid1 object gets fetched for the first time, only the first ten CDOIDs will be loaded for every list attribute it has. This changes nothing for the ref1 list since it contains only 3 items. However, the ref2 list will contain ten items only:

As soon as any element beyond the tenth element gets accessed in the list, CDO asks the CDOCollectionLoadingPolicy feature to fill more elements. The example policy would load twenty more CDOIDs into the list.

Also, if the list is accessed by index, it does not need to fetch items from the beginning of the index, only that defined by the CDOCollectionLoadingPolicy feature.

Based on some tests, good performance can be achieved by using the following settings:

The CDORevisionPrefetchingPolicy feature of the CDOView allows CDO users to fetch many objects at a time.

The difference between the CDOCollectionLoadingPolicy feature and the CDORevisionPrefetchingPolicy feature is subtle. The CDOCollectionLoadingPolicy feature determines how and when to fetch CDOIDs, while the CDORevisionPrefetchingPolicy feature determines how and when to resolve CDOIDs (i.e. fetch the target objects).

What happens when list items are being accessed? The list fetches objects one at a time.

As an example, here is what happens while iterating through the ref1 list:

iterator.next();

oid3 is not in the cache, load oid3

iterator.next();

oid4 is not in the cache, load oid4

iterator.next();

oid5 is not in the cache, load oid5

Steps 2, 4 and 6 are the slowest operations. Since oid3 is not in the cache, it will be fetched from the server. Every object will be fetched sequentially.
Why not be smarter? Why not load more objects at a time? This would reduce the number of client-server round trips. When oid3 is being loaded, oid4 and oid5 could be loaded at the same time.

iterator.next();

oid3 is not in the cache, load oid3, oid4, oid5

iterator.next();

oid4 is in the cache

iterator.next();

oid5 is in the cache

Instead of three, only one call will be made to the server. How many calls would be safe for a list containing 100 or 10,000 items?
This feature uses CDOView.setRevisionPrefetchingPolicy. For example:

The end-user could provide its own implementation of the CDORevisionPrefetchingPolicy interface.

Prefetching Nested Objects Explicitely – cdoPrefetch()

As of CDO 3.0 the CDOObject interface supports prefetching of (the revisions for) nested objects, e.g.:

object.cdoPrefetch(CDORevision.DEPTH.INFINITE);

Defining Fetch Rules Dynamically – CDOFetchAnalyzer

In many applications, hard coded rules are used to determine what to fetch. This is mainly to speed up applications. Basically, these rules define, for a specific context, which path to load from a root object. By doing that, only the data that needs to be loaded will be loaded. Usually, these rules are really hard to maintain: models change, applications change, ...

The CDOFetchAnalyzer feature can be used to define rules, but it does so in a dynamic fashion. It detects patterns in the way objects are accessed in a specific context and, when that context comes back, it loads the same path from different root objects.