Details

Description

As discussed on the dev list, the fo tree memory usage is somewhat excessive due
mainly to a large number of properties objects being generated. A number of
patches should be added here as these issues are slowly addressed.

Richard Wheeldon
added a comment - 27/Nov/06 03:17 On the fo test example I'm using here, this reduces the instance count for
EnumNumber from ~100,000 to 3 and the instance count for EnumProperty from
~50000 to ~200.

Comments out only those properties whose getters/setters do not form part of
the public API. For example Block retains commonAural despite this never being
used, because it forms part of the API via Block.getCommonAural().

Richard Wheeldon
added a comment - 27/Nov/06 04:44 Comments out only those properties whose getters/setters do not form part of
the public API. For example Block retains commonAural despite this never being
used, because it forms part of the API via Block.getCommonAural().

Been pondering a bit more about this myself, and thought up a few things to keep in mind in the
process:

1) In some cases a Maker uses instance methods of the FONode for which the property is being created.
Even the most trivial FONode.getNameId() can already result in a slightly different Property instance
being created.
2) Of specific interest are percentages and other relative values.
Their specified value -say 70%- may be the same, but the underlying LengthBase has an FObj
instance member that comes into play when resolving that percentage during layout. A Property
instance for a percentage value created for one FObj can currently not be re-used for another FObj.
Percentages need to be excluded, I think, and should always trigger the creation of new instances,
unless there's also a design-change making it possible to reset the FObj member of a single
LengthBase.

3) Also of interest: ToBeImplementedProperty. We definitely should check whether we really need
different instances, or if one instance would be enough. Nothing is done with them anyway. At the very
least: one instance per distinct initial value should suffice.

4) A LengthProperty instance for a property with a specified absolute value of "10pt" could be re-used
for all <length> properties (font-size, line-height, ...). This is what your patch currently demonstrates
for the Enums. Again, a very nice proof-of-concept!

Just one tiny question/remark: would it make sense IYO to move the propertyCache to a central
location, say a PropertyPool, sort of a central map structured by the property types, instead of keeping
these caches local to the types themselves? Provide two access points -get() and put()- and keep all
the other related logic encapsulated in there...

Andreas L. Delmelle
added a comment - 27/Nov/06 10:27 Richard,
Nice job, so far!
Been pondering a bit more about this myself, and thought up a few things to keep in mind in the
process:
1) In some cases a Maker uses instance methods of the FONode for which the property is being created.
Even the most trivial FONode.getNameId() can already result in a slightly different Property instance
being created.
2) Of specific interest are percentages and other relative values.
Their specified value - say 70% - may be the same, but the underlying LengthBase has an FObj
instance member that comes into play when resolving that percentage during layout. A Property
instance for a percentage value created for one FObj can currently not be re-used for another FObj.
Percentages need to be excluded, I think, and should always trigger the creation of new instances,
unless there's also a design-change making it possible to reset the FObj member of a single
LengthBase.
3) Also of interest: ToBeImplementedProperty. We definitely should check whether we really need
different instances, or if one instance would be enough. Nothing is done with them anyway. At the very
least: one instance per distinct initial value should suffice.
4) A LengthProperty instance for a property with a specified absolute value of "10pt" could be re-used
for all <length> properties (font-size, line-height, ...). This is what your patch currently demonstrates
for the Enums. Again, a very nice proof-of-concept!
Just one tiny question/remark: would it make sense IYO to move the propertyCache to a central
location, say a PropertyPool, sort of a central map structured by the property types, instead of keeping
these caches local to the types themselves? Provide two access points - get() and put() - and keep all
the other related logic encapsulated in there...

Andreas L. Delmelle
added a comment - 06/Dec/06 14:55 I wonder... especially how this would impact multi-thread processing (see
synchronization)
Seems to break nothing, but I currently have no good testcases to be able to
judge whether this would mean an improvement. Purely from an aesthetical point
of view, it looks better, though...
Opinions?

Looking again at your patch, one more remark:
The only thing that doesn't completely sit right with me, is the lookup based on an instance of the class
in question... This seems to me to defeat the purpose of reducing the instantiation rate. True, the
instance's scope remains limited to the getInstance() method, but they're created (and have to be GC'ed)
anyway :/
[Compare it to String internalization: I doubt the JVM creates a String instance first and throws it away
afterwards if an internalized String for that text already exists. I'd think the extra instance is never even
created...?]

See my proposed change in Marker.java: the MarkerAttribute constructor is called only when no
corresponding instance exists in the map yet. I'm still chewing on how to generalize such an approach
in the properties package.
In the meantime, also did something similar for fo.expr.NumericProperty... again nothing broke, but I
still have to dig up a testcase, to see how precisely this impacts the resource consumption before
committing such stuff.

Example of the expected results of the change in Marker.java:
suppose 200 Markers, one-element subtree with attribute-set of 10 attributes

The rest lies somewhere in between.
The general use-cases would incline towards Case 1 more than Case 2 (or even a case where there are
200 distinct attribute-sets that have absolutely no common attributes). The more distinct names, the
larger the Map, but this is limited to PROPERTY_COUNT keys. The Lists can be expected to remain
reasonably sized (practical limitations on the number of distinct possible values)

If the synchronization works out as expected, a generalized approach for all Property subtypes could
vastly reduce memory usage in multi-thread context (the above figures multiplied by a factor of 15, or
more), especially if all the different threads use the same set of templates, generating heaps of identical
attributes.

Andreas L. Delmelle
added a comment - 07/Dec/06 12:33
Richard,
(In reply to comment #1)
> Created an attachment (id=19176) [edit]
> Replace EnumProperty and EnumNumber constructors with getInstance()
Looking again at your patch, one more remark:
The only thing that doesn't completely sit right with me, is the lookup based on an instance of the class
in question... This seems to me to defeat the purpose of reducing the instantiation rate. True, the
instance's scope remains limited to the getInstance() method, but they're created (and have to be GC'ed)
anyway :/
[Compare it to String internalization: I doubt the JVM creates a String instance first and throws it away
afterwards if an internalized String for that text already exists. I'd think the extra instance is never even
created...?]
See my proposed change in Marker.java: the MarkerAttribute constructor is called only when no
corresponding instance exists in the map yet. I'm still chewing on how to generalize such an approach
in the properties package.
In the meantime, also did something similar for fo.expr.NumericProperty... again nothing broke, but I
still have to dig up a testcase, to see how precisely this impacts the resource consumption before
committing such stuff.
Example of the expected results of the change in Marker.java:
suppose 200 Markers, one-element subtree with attribute-set of 10 attributes
BEFORE:
2000 separate MarkerAttribute instances w/ 4 String references each, good for 8000 Strings in toto.
Case 1
-> recurring identical attribute-set
AFTER: 1 Map with 10 entries, 10 Strings plus 10 Lists consisting of one MarkerAttribute instance each,
makes a total of only 10 instances (40 Strings) and 2000 references to one of those 10 instances.
Case 2
-> distinct attribute-sets (recurring names, different values)
NEW: 1 Map with 10 entries, 10 Strings plus 10 Lists consisting of 200 MarkerAttribute instances each...
Slight drawback because of the added Map + List instances, and the overhead of the lookup.
The rest lies somewhere in between.
The general use-cases would incline towards Case 1 more than Case 2 (or even a case where there are
200 distinct attribute-sets that have absolutely no common attributes). The more distinct names, the
larger the Map, but this is limited to PROPERTY_COUNT keys. The Lists can be expected to remain
reasonably sized (practical limitations on the number of distinct possible values)
If the synchronization works out as expected, a generalized approach for all Property subtypes could
vastly reduce memory usage in multi-thread context (the above figures multiplied by a factor of 15, or
more), especially if all the different threads use the same set of templates, generating heaps of identical
attributes.

I am not sure that your approach is worth the trouble. An EnumProperty and an
Attribute object contain only a name and a value, and two more strings in the
latter case. It is quite efficient to use that as a key, and creating an object
for lookup is not very costly in view of the efficient hash lookup you achieve.
First doing a lookup on name and then on value may cost more than it saves.

Of course a map in which the key and the value are always identical seems
strange. I have looked into Set, which guarantees uniqueness of the objects, but
it does not allow efficient retrieval.

At first I was wary about Richard's commenting out of properties. In principle
every property has an effect on layout and must be looked up. But if we wish to
release a production version, we cannot waste performance on such a principle.
Therefore I think a careful commenting out of properties not looked up is useful.

Regards, Simon

> Richard,
>
> (In reply to comment #1)
> > Created an attachment (id=19176) [edit][edit]
> > Replace EnumProperty and EnumNumber constructors with getInstance()
>
> Looking again at your patch, one more remark:
> The only thing that doesn't completely sit right with me, is the lookup based
on an instance of the class
> in question... This seems to me to defeat the purpose of reducing the
instantiation rate. True, the
> instance's scope remains limited to the getInstance() method, but they're
created (and have to be GC'ed)
> anyway :/
>
> See my proposed change in Marker.java: the MarkerAttribute constructor is
called only when no
> corresponding instance exists in the map yet. I'm still chewing on how to
generalize such an approach
> in the properties package.
> In the meantime, also did something similar for fo.expr.NumericProperty...
again nothing broke, but I
> still have to dig up a testcase, to see how precisely this impacts the
resource consumption before
> committing such stuff.

Simon Pepping
added a comment - 08/Dec/06 13:14 (In reply to comment #6)
I think Richard's approach is OK for read-only properties.
Andreas, why is your valueCache not also a map on the value?
I am not sure that your approach is worth the trouble. An EnumProperty and an
Attribute object contain only a name and a value, and two more strings in the
latter case. It is quite efficient to use that as a key, and creating an object
for lookup is not very costly in view of the efficient hash lookup you achieve.
First doing a lookup on name and then on value may cost more than it saves.
Of course a map in which the key and the value are always identical seems
strange. I have looked into Set, which guarantees uniqueness of the objects, but
it does not allow efficient retrieval.
At first I was wary about Richard's commenting out of properties. In principle
every property has an effect on layout and must be looked up. But if we wish to
release a production version, we cannot waste performance on such a principle.
Therefore I think a careful commenting out of properties not looked up is useful.
Regards, Simon
> Richard,
>
> (In reply to comment #1)
> > Created an attachment (id=19176) [edit] [edit]
> > Replace EnumProperty and EnumNumber constructors with getInstance()
>
> Looking again at your patch, one more remark:
> The only thing that doesn't completely sit right with me, is the lookup based
on an instance of the class
> in question... This seems to me to defeat the purpose of reducing the
instantiation rate. True, the
> instance's scope remains limited to the getInstance() method, but they're
created (and have to be GC'ed)
> anyway :/
>
> See my proposed change in Marker.java: the MarkerAttribute constructor is
called only when no
> corresponding instance exists in the map yet. I'm still chewing on how to
generalize such an approach
> in the properties package.
> In the meantime, also did something similar for fo.expr.NumericProperty...
again nothing broke, but I
> still have to dig up a testcase, to see how precisely this impacts the
resource consumption before
> committing such stuff.

I thought of that too, but since we're trying to save on resources here I'm a bit wary of the extra objects
a Map generates under-the-hood.
ArrayLists have very low overhead in that respect. Very little more space needed on top of the instances
it stores. Given that the number of distinct values for one attribute is expected to remain at a
reasonable level, I'd even go as far as stating that the list iteration will perform roughly the same as a
map lookup. All we lose by using a List is the extra space needed for the Map.Entrys...

> I am not sure that your approach is worth the trouble.

Indeed, see also Joerg's response on fop-dev.
I was not either, but it would get interesting if such an approach could be used on the complex types.

On the other hand, even if you start from a type containing only two String instance members, having
thousands of references to one singular instance of that type is mathematically always better than
creating thousands of instances with the two String references being identical.
The benefit of Richard's approach is that it could rather easily be ported to all Property types. My
approach seems more appropriate for the Attribute type. Looking closer, it would get very difficult to
port it to the properties.

> An EnumProperty and an Attribute object contain only a name and a value,
> and two more strings in the latter case. It is quite efficient to use that as a key,
> and creating an object for lookup is not very costly in view of the efficient hash
> lookup you achieve.
> First doing a lookup on name and then on value may cost more than it saves.

Could be... I really should try to measure it.

> Of course a map in which the key and the value are always identical seems
> strange. I have looked into Set, which guarantees uniqueness of the objects, but
> it does not allow efficient retrieval.

Hence, index-based list-retrieval! 8)
The uniqueness follows from the implementation, I think...
If the Maps and Lists are properly synchronized, then there will always be only one distinct instance per
name/value.
Each get() following the initial put() will return that singular instance.

> At first I was wary about Richard's commenting out of properties. In principle
> every property has an effect on layout and must be looked up. But if we wish to
> release a production version, we cannot waste performance on such a principle.
> Therefore I think a careful commenting out of properties not looked up is useful.

Andreas L. Delmelle
added a comment - 08/Dec/06 14:05 (In reply to comment #7)
> Andreas, why is your valueCache not also a map on the value?
I thought of that too, but since we're trying to save on resources here I'm a bit wary of the extra objects
a Map generates under-the-hood.
ArrayLists have very low overhead in that respect. Very little more space needed on top of the instances
it stores. Given that the number of distinct values for one attribute is expected to remain at a
reasonable level, I'd even go as far as stating that the list iteration will perform roughly the same as a
map lookup. All we lose by using a List is the extra space needed for the Map.Entrys...
> I am not sure that your approach is worth the trouble.
Indeed, see also Joerg's response on fop-dev.
I was not either, but it would get interesting if such an approach could be used on the complex types.
On the other hand, even if you start from a type containing only two String instance members, having
thousands of references to one singular instance of that type is mathematically always better than
creating thousands of instances with the two String references being identical.
The benefit of Richard's approach is that it could rather easily be ported to all Property types. My
approach seems more appropriate for the Attribute type. Looking closer, it would get very difficult to
port it to the properties.
> An EnumProperty and an Attribute object contain only a name and a value,
> and two more strings in the latter case. It is quite efficient to use that as a key,
> and creating an object for lookup is not very costly in view of the efficient hash
> lookup you achieve.
> First doing a lookup on name and then on value may cost more than it saves.
Could be... I really should try to measure it.
> Of course a map in which the key and the value are always identical seems
> strange. I have looked into Set, which guarantees uniqueness of the objects, but
> it does not allow efficient retrieval.
Hence, index-based list-retrieval! 8)
The uniqueness follows from the implementation, I think...
If the Maps and Lists are properly synchronized, then there will always be only one distinct instance per
name/value.
Each get() following the initial put() will return that singular instance.
> At first I was wary about Richard's commenting out of properties. In principle
> every property has an effect on layout and must be looked up. But if we wish to
> release a production version, we cannot waste performance on such a principle.
> Therefore I think a careful commenting out of properties not looked up is useful.
Agreed here. We can always reactivate them later.

(In reply to comment #8)
> ... I'd even go as far as stating that the list iteration will perform roughly the same as a
> map lookup. All we lose by using a List is the extra space needed for the Map.Entrys...

Just FYI:
Finally got around to performing the measurements, and I was wrong about this. Using Maps is much
faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the
way to go.

Andreas L. Delmelle
added a comment - 16/Dec/06 14:14 (In reply to comment #8)
> ... I'd even go as far as stating that the list iteration will perform roughly the same as a
> map lookup. All we lose by using a List is the extra space needed for the Map.Entrys...
Just FYI:
Finally got around to performing the measurements, and I was wrong about this. Using Maps is much
faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the
way to go.

(In reply to comment #9)
> Finally got around to performing the measurements, and I was wrong about this. Using Maps is much
> faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the
> way to go.

Andreas L. Delmelle
added a comment - 17/Dec/06 04:02 (In reply to comment #9)
> Finally got around to performing the measurements, and I was wrong about this. Using Maps is much
> faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the
> way to go.
As a consequence of these results, I adapted my change in Marker.java to use Maps instead of Lists, and
committed it: http://svn.apache.org/viewvc?view=rev&rev=487972

Some notes regarding the comments so far:
1. An immutable class is guaranteed to be thread safe. I'm aiming at changing
all property classes to be this way, as I believe they were intended to be.
2. Constructing lots of short lived objects and comparing them against a few
long lived ones is much quicker than than creating lots of long lived objects.
If we were writing in C this would not always be the case - there would be no
tree traversal and an equal number of malloc/frees. Given that this is Java, we
reduce the number of GC passes and allow the newer collectors to work more
efficiently.
3. String.intern() does require an existing String.
4. The location of PropertyCache and the cached values can be changed later if
required.

Richard Wheeldon
added a comment - 18/Dec/06 03:46 Some notes regarding the comments so far:
1. An immutable class is guaranteed to be thread safe. I'm aiming at changing
all property classes to be this way, as I believe they were intended to be.
2. Constructing lots of short lived objects and comparing them against a few
long lived ones is much quicker than than creating lots of long lived objects.
If we were writing in C this would not always be the case - there would be no
tree traversal and an equal number of malloc/frees. Given that this is Java, we
reduce the number of GC passes and allow the newer collectors to work more
efficiently.
3. String.intern() does require an existing String.
4. The location of PropertyCache and the cached values can be changed later if
required.

Just a minor clarification:
> 3. String.intern() does require an existing String.

Obviously, explicit internalization does, but what I was referring to earlier is the difference between
explicit (run-time) and implicit (compile-time) internalization: in the latter case, I'm pretty sure a separate
instance never gets created...

Andreas L. Delmelle
added a comment - 18/Dec/06 09:32 (In reply to comment #11)
Thanks for the info, Richard.
Just a minor clarification:
> 3. String.intern() does require an existing String.
Obviously, explicit internalization does, but what I was referring to earlier is the difference between
explicit (run-time) and implicit (compile-time) internalization: in the latter case, I'm pretty sure a separate
instance never gets created...

Relative to revision 496642. Includes previous changes and further
modifications to CommonXXX and XXXProperty classes. Breaks TXTHandler and will
affect any other code which relies on Mutable properties. For the most part,
the patch consists of adding hashCode() and equals() methods and replacing
constructor calls with calls to static getInstance methods.

Richard Wheeldon
added a comment - 16/Jan/07 02:25 Relative to revision 496642. Includes previous changes and further
modifications to CommonXXX and XXXProperty classes. Breaks TXTHandler and will
affect any other code which relies on Mutable properties. For the most part,
the patch consists of adding hashCode() and equals() methods and replacing
constructor calls with calls to static getInstance methods.

This looks about as good as it's going to get for the moment. The TXTHandler
issues turned out not to be relevant - nothing seems to be accessing this code
and simply removing it (as this patch does) affects nothing else insofar as I
can tell. I'd be interested in any feedback on this. Unless this patch breaks
something I haven't accounted for, I'd suggest applying it to trunk as-is.

Richard Wheeldon
added a comment - 25/Apr/07 09:35 This looks about as good as it's going to get for the moment. The TXTHandler
issues turned out not to be relevant - nothing seems to be accessing this code
and simply removing it (as this patch does) affects nothing else insofar as I
can tell. I'd be interested in any feedback on this. Unless this patch breaks
something I haven't accounted for, I'd suggest applying it to trunk as-is.

I've finally found some time to finish reviewing your patch, and it looked good at first glance, apart
from some style issues (javadocs, bracing conditional branches even if there's only one statement,
blabla) and what I take to be some small typos (for example: in CommonHyphenation.equals(), you
compared the country, language and script properties to themselves instead of to their counterparts
from the other instance).

Unfortunately, when I ran the junit tests, there seems to be a lot of work before we can apply it as is... I
get 70 errors. :/ (a lot of them NPEs)
Am I doing something wrong, or did you skip the testsuite?

The only thing I really threw out were the deprecated-tags from CompoundDataType and
LengthRangeProperty. This does need some further thought, nevertheless.
For CommonBorderPaddingBackground.setBorderInfo(), I agreed. So much even that I removed
setBorderInfo() and setPadding() completely, instead of merely deprecating.

For now, I'll offer a bit more info on the rationale behind the mutability of compounds, and an idea on
how to tackle it.
Compounds would be tricky in terms of canonicalization, because, in order to judge which compound
instance you need, you have to have all the components available (e.g.: for space-before you need
the .minimum, .maximum, .optimum, .conditionality and .precedence components).
This means that, currently, in order to correctly canonicalize a compound, we would have to wait until
all attributes for a given FO have been converted to properties. We cannot do so at the time the
compound is first created, since it will at that point only have the specified value of /one/ of the
components (the first one in the list)

The sole reason why they are mutable is that they are not created in one go, but rather, the base-
property is created once the first component is encountered in the attribute-list. Further on in the
process, as the other components are converted, the PropertyMakers need to be able to change the
base-property (through setComponent())

A possible solution I'm thinking of is to rewrite part of the convertAttributeToProperty-loop in
PropertyList. If we can make sure that compounds are always created as a single compound property
(instead of the multiple components each setting the component of a base-property), then they would
not need to be mutable at all, and the issue is rendered moot.
FWIW, I've always disliked a bit the fact that the attributes are simply looped over. The hacks a bit
higher up, in addAttributesToList(), for extracting column-number and font-size before all others, are
to me an indication that this should be revisited.

As I roughly see it ATM, a compound would be made/instantiated using a list of attributes, instead of
only one. As soon as one component is encountered, we would immediately scan the Attributes,
extracting all the other components, and can then pass them all into the compound's PropertyMaker. As
such, there is no more need for the setComponent() methods, since by converting one component's
attribute, we would immediately convert the whole compound.

Andreas L. Delmelle
added a comment - 25/Jun/07 16:39 Hi Richard,
I've finally found some time to finish reviewing your patch, and it looked good at first glance, apart
from some style issues (javadocs, bracing conditional branches even if there's only one statement,
blabla) and what I take to be some small typos (for example: in CommonHyphenation.equals(), you
compared the country, language and script properties to themselves instead of to their counterparts
from the other instance).
Unfortunately, when I ran the junit tests, there seems to be a lot of work before we can apply it as is... I
get 70 errors. :/ (a lot of them NPEs)
Am I doing something wrong, or did you skip the testsuite?
The only thing I really threw out were the deprecated-tags from CompoundDataType and
LengthRangeProperty. This does need some further thought, nevertheless.
For CommonBorderPaddingBackground.setBorderInfo(), I agreed. So much even that I removed
setBorderInfo() and setPadding() completely, instead of merely deprecating.
For now, I'll offer a bit more info on the rationale behind the mutability of compounds, and an idea on
how to tackle it.
Compounds would be tricky in terms of canonicalization, because, in order to judge which compound
instance you need, you have to have all the components available (e.g.: for space-before you need
the .minimum, .maximum, .optimum, .conditionality and .precedence components).
This means that, currently, in order to correctly canonicalize a compound, we would have to wait until
all attributes for a given FO have been converted to properties. We cannot do so at the time the
compound is first created, since it will at that point only have the specified value of /one/ of the
components (the first one in the list)
The sole reason why they are mutable is that they are not created in one go, but rather, the base-
property is created once the first component is encountered in the attribute-list. Further on in the
process, as the other components are converted, the PropertyMakers need to be able to change the
base-property (through setComponent())
A possible solution I'm thinking of is to rewrite part of the convertAttributeToProperty-loop in
PropertyList. If we can make sure that compounds are always created as a single compound property
(instead of the multiple components each setting the component of a base-property), then they would
not need to be mutable at all, and the issue is rendered moot.
FWIW, I've always disliked a bit the fact that the attributes are simply looped over. The hacks a bit
higher up, in addAttributesToList(), for extracting column-number and font-size before all others, are
to me an indication that this should be revisited.
As I roughly see it ATM, a compound would be made/instantiated using a list of attributes, instead of
only one. As soon as one component is encountered, we would immediately scan the Attributes,
extracting all the other components, and can then pass them all into the compound's PropertyMaker. As
such, there is no more need for the setComponent() methods, since by converting one component's
attribute, we would immediately convert the whole compound.
WDYT?
Andreas

Status update: something really weird seems to be going on here.
Continuing the tests with 'block-container_content_size_percentage.xml'.

I placed a breakpoint around where the error originates ( AbstractBaseLM, line 89 ), then on each break
checked the call stack, and noticed that for all nested block-containers except the first one, the stack
points back to BlockContainerLM line 218, which is inside a conditional block that should only be entered
if width="auto"

Andreas L. Delmelle
added a comment - 28/Jun/07 15:06 Status update: something really weird seems to be going on here.
Continuing the tests with 'block-container_content_size_percentage.xml'.
I placed a breakpoint around where the error originates ( AbstractBaseLM, line 89 ), then on each break
checked the call stack, and noticed that for all nested block-containers except the first one, the stack
points back to BlockContainerLM line 218, which is inside a conditional block that should only be entered
if width="auto"
The search continues...

More info:
Reverting the change in the usage of the PropertyCache by PropertyList solves 11 of the 70 errors I
initially encountered. The others are related to the Common* property bundles that include properties
that can be percentages (padding, margin...)

The idea is right, but the implementation is slightly in error. Most likely due to the inherent complexity
of the properties package...
Once you get it, it's really simple actually:
Every Property class has its own particular Maker that deals with converting an attribute value to the
correct type of Property instance. FOPropertyMapping initializes a cache of those makers, one for each
property defined in the XSL-FO Rec. (still limited to 1.0)
The PropertyList deals with the entire attribute list for a single FO. Given an Attributes instance, it
extracts a few properties first -font-size, column-number- because of possible dependencies in
other properties --em-lengths, from-table-column(). All the others are looped over: fetch Maker and
call Maker.make(), basically, which triggers the PropertyParser to parse the expression and return a
generic Property instance, possibly a compound or a list of Property instances...

Thinking some more about the general approach, the pseudo-singleton pattern by itself is not enough
here (private constructors and a getInstance()).
I'm thinking of also weaving some of it into the Maker-logic. Have the Makers bear the responsibility
that no necessary duplicates are missing (as is the case for the LengthBase of "50%" for instance, after
applying the patch)
Also, I'm going to set aside the idea about creating canonical Common* bundles FTM, apart from the
quite common "no border, no padding, no background" scenario. The padding length-conditionals and
BorderInfos are already difficult enough to handle. If the border, padding and background properties by
themselves are guaranteed to be canonical, then a lot of the 'duplicate' bundles will be nothing but
wrappers around a bunch of references to the very same BorderInfos, absolute padding-widths, etc.
That's already bound to save a lot.

There seems to be an inherent error, not only WRT percentages, but more generally any relative value. If
I get the drift of the patch correctly, it creates a canonical Property instance for the attribute:
width="2em"

Meaning: for a LengthProperty with an explicitly specified value of "2em"...? What if we then later on
encounter that same attribute value on another, very likely unrelated node?

Andreas L. Delmelle
added a comment - 29/Jun/07 12:19
More info:
Reverting the change in the usage of the PropertyCache by PropertyList solves 11 of the 70 errors I
initially encountered. The others are related to the Common* property bundles that include properties
that can be percentages (padding, margin...)
The idea is right, but the implementation is slightly in error. Most likely due to the inherent complexity
of the properties package...
Once you get it, it's really simple actually:
Every Property class has its own particular Maker that deals with converting an attribute value to the
correct type of Property instance. FOPropertyMapping initializes a cache of those makers, one for each
property defined in the XSL-FO Rec. (still limited to 1.0)
The PropertyList deals with the entire attribute list for a single FO. Given an Attributes instance, it
extracts a few properties first - font-size, column-number - because of possible dependencies in
other properties --em-lengths, from-table-column(). All the others are looped over: fetch Maker and
call Maker.make(), basically, which triggers the PropertyParser to parse the expression and return a
generic Property instance, possibly a compound or a list of Property instances...
Thinking some more about the general approach, the pseudo-singleton pattern by itself is not enough
here (private constructors and a getInstance()).
I'm thinking of also weaving some of it into the Maker-logic. Have the Makers bear the responsibility
that no necessary duplicates are missing (as is the case for the LengthBase of "50%" for instance, after
applying the patch)
Also, I'm going to set aside the idea about creating canonical Common* bundles FTM, apart from the
quite common "no border, no padding, no background" scenario. The padding length-conditionals and
BorderInfos are already difficult enough to handle. If the border, padding and background properties by
themselves are guaranteed to be canonical, then a lot of the 'duplicate' bundles will be nothing but
wrappers around a bunch of references to the very same BorderInfos, absolute padding-widths, etc.
That's already bound to save a lot.
There seems to be an inherent error, not only WRT percentages, but more generally any relative value. If
I get the drift of the patch correctly, it creates a canonical Property instance for the attribute:
width="2em"
Meaning: for a LengthProperty with an explicitly specified value of "2em"...? What if we then later on
encounter that same attribute value on another, very likely unrelated node?

Make the PropertyCache more than a simple wrapper around a WeakHashMap, and instead of giving
each Property type its own cache, make the Map multi-dimensional (or an array of WeakHashMaps?)
The base Map would be, for example, a mapping of Class objects to a WeakHashMap, which would then
contain the canonical instances for a particular Property type.

The PropertyMakers, in their turn, will use this cache whenever they receive a Property from the
PropertyParser that can be safely canonicalized (absolute lengths, enums, strings, ...). If the Property
they receive from the parser is a relative one, they bypass the cache and make sure to return the new
instance.

Andreas L. Delmelle
added a comment - 30/Jun/07 09:47
Some more brainstorming about correcting the ideas:
Make the PropertyCache more than a simple wrapper around a WeakHashMap, and instead of giving
each Property type its own cache, make the Map multi-dimensional (or an array of WeakHashMaps?)
The base Map would be, for example, a mapping of Class objects to a WeakHashMap, which would then
contain the canonical instances for a particular Property type.
Something like:
public final class PropertyCache {
private static Map typeCache = new HashMap();
private static Map propCache;
/**
Checks if the given property is present in the cache - if so, returns
a reference to the cached value. Otherwise the given object is added
to the cache and returned.
@param obj
@return the cached instance
*/
public static synchronized Property fetch(Property prop) {
Class propType = prop.getClass();
propCache = (Map) typeCache.get(propType);
if (propCache == null)
{
propCache = new WeakHashMap();
propCache.put(prop, prop);
typeCache.put(propType, propCache);
return prop;
}
else {
Property cacheEntry = (Property) propCache.get(prop);
if (cacheEntry != null)
{
return cacheEntry;
}
else
{
propCache.put(prop, prop);
return prop;
}
}
}
}
The PropertyMakers, in their turn, will use this cache whenever they receive a Property from the
PropertyParser that can be safely canonicalized (absolute lengths, enums, strings, ...). If the Property
they receive from the parser is a relative one, they bypass the cache and make sure to return the new
instance.

Andreas L. Delmelle
added a comment - 07/Jul/07 14:18 Basics of the patch have been applied, with some modifications:
removed usage of the cache by the PropertyList and Common*-bundles
applied caching to a restricted set of Property types that can be safely canonicalized
see: http://svn.apache.org/viewvc?view=rev&rev=554251
Closing this one. For any improvements, just create a new report.