From jdom at tuis.net Tue Sep 4 03:41:55 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 04 Sep 2012 06:41:55 -0400
Subject: [jdom-interest] Preparing for JDOM 2.0.3
Message-ID: <5045DAF3.7010108@tuis.net>
Hi All.
A few issues have been identified in JDOM over the past few weeks. When
the first issue was resolved (unable to serialize subclasses of Element
outside org.jdom2 package) I promised to release 2.0.3 over this past
weekend, but a second (low priority) issue was identified (lack of
support for specific JAXP factory).
Additionally, 'Canadian Wilf' showed interest in improving the
performance of the Verifier code. As a result, I have been working with
Wilf to get the Verifier code 'fast'.
Taken together, it all means that I have 'slipped' this release date...
Right now the performance changes have been completed successfully, with
the Verifier now running in about one third of the time it used to. This
speeds up parsing considerably. THere is a wiki page documenting the
process here: https://github.com/hunterhacker/jdom/wiki/Verifier-Performance
I have just built the 'hotfix' package containing all fixes since JDOM
2.0.2 and posted it to github here:
https://github.com/hunterhacker/jdom/downloads
I intend to release the full 2.0.3 package on this coming weekend
(slipping the 2.0.3 release date by 1 week).
Thanks & Happy Coding
Rolf
From noel at peralex.com Thu Sep 6 06:40:35 2012
From: noel at peralex.com (Noel Grandin)
Date: Thu, 06 Sep 2012 15:40:35 +0200
Subject: [jdom-interest] Preparing for JDOM 2.0.3
In-Reply-To: <5045DAF3.7010108@tuis.net>
References: <5045DAF3.7010108@tuis.net>
Message-ID: <5048A7D3.7070701@peralex.com>
Very nice work!
On 2012-09-04 12:41, Rolf Lear wrote:
> Hi All.
>
> A few issues have been identified in JDOM over the past few weeks.
> When the first issue was resolved (unable to serialize subclasses of
> Element outside org.jdom2 package) I promised to release 2.0.3 over
> this past weekend, but a second (low priority) issue was identified
> (lack of support for specific JAXP factory).
>
> Additionally, 'Canadian Wilf' showed interest in improving the
> performance of the Verifier code. As a result, I have been working
> with Wilf to get the Verifier code 'fast'.
>
> Taken together, it all means that I have 'slipped' this release date...
>
> Right now the performance changes have been completed successfully,
> with the Verifier now running in about one third of the time it used
> to. This speeds up parsing considerably. THere is a wiki page
> documenting the process here:
> https://github.com/hunterhacker/jdom/wiki/Verifier-Performance
>
> I have just built the 'hotfix' package containing all fixes since JDOM
> 2.0.2 and posted it to github here:
> https://github.com/hunterhacker/jdom/downloads
>
> I intend to release the full 2.0.3 package on this coming weekend
> (slipping the 2.0.3 release date by 1 week).
>
> Thanks & Happy Coding
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>
>
Disclaimer: http://www.peralex.com/disclaimer.html
From jdom at tuis.net Fri Sep 7 11:48:01 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 07 Sep 2012 14:48:01 -0400
Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull
knife and trample my dead body
In-Reply-To:
References:
<5049F11E.8050004@saxonica.com>
Message-ID: <82068eb04129387f7b45c8e8893644ab@tuis.net>
Hi Wilf.
You are getting your wires crossed..... In your mail you referenced parsed
and external entities. These have nothing to do with PCDATA (parsed
character data - regular XML text), and CDATA (unparsed character data -
)
Michael was answering your question based on the 'entities', where as you
want the details on the 'PCDATA' and the 'CDATA'.
So, forget about the 'entity' references, and focus on the valid character
data for XML.
The only difference between CDATA (character blocks between ) and PCDATA (element 'text'), is that the XML Parser will look for
'
wrote:
> Then what is the proper mode:
>
> Element e = new Element("foo")
>
> Should I do this:
>
> e.setText(string_of_sanitized_data_with_illegal_characters_escaped);
>
> or
>
> e.setText(any_text);
>
>
> Wilf
>
>
> On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay wrote:
>
>> No, that's all wrong. The contents of an unparsed entity are always an
>> external resource, they are never part of a text or attribute node.
>> Parsed
>> entities do become part of the content, but they must always use the
XML
>> character set.
>>
>> Michael Kay
>> Saxonica
>>
>> On 07/09/2012 13:10, Canadian Wilf wrote:
>>
>> According to the xml 1.1 spec:
>>
>> 4 Physical Structures ...
>>> [Definition: An *unparsed entity* is a resource whose contents may or
>>> may not be text , and if text,
may
>>> be other than XML. Each unparsed entity has an associated
>>> notation,
>>> identified by name. Beyond a requirement that an XML processor make
the
>>> identifiers for the entity and notation available to the application,
>>> XML
>>> places no constraints on the contents of unparsed entities.]
>>
>>
>>
>> AND
>>
>> Entities may be either parsed or unparsed. [Definition: The contents
of
>>> a *parsed entity* are referred to as its replacement
>>> text;
>>> this text is considered an
>>> integral part of the document.]
>>
>> [Definition: An *unparsed entity* is a resource whose contents may or
may
>>> not be text , and if text, may be
>>> other than XML. Each unparsed entity has an associated
>>> notation,
>>> identified by name. Beyond a requirement that an XML processor make
the
>>> identifiers for the entity and notation available to the application,
>>> XML
>>> places no constraints on the contents of unparsed entities.]
>>> Parsed entities are invoked by name using entity references; unparsed
>>> entities by name, given in the value of *ENTITY* or *ENTITIES*
>>> attributes.
>>
>>
>>
>> In the current JDOM version, Element method setText(string) and also
>> addContent(CDATA) refuses text that contains illegal characters. It is
>> treating the data provided as 'parsed' when it should by the spec be
>> treating it as free content.
>>
>> I understand:
>>
>> 1) The xml 1.1 spec defines a parsed entity as its 'replacement
text'.
>>
>> 2) Replacement text' would refer to the actual textual makeup of a
>> serialized Element, not the data an Element holds in a Text content
>> element
>>
>>
>> Then, if the above is true, the current implementation is actually
wrong
>> to verify data.
>>
>> I propose that JDOM stop verifying data set as Element text and CDATA
>> and leave it to the xerces (or whatever) to make sure the document is
>> proper 1.1.
>>
>> Am I understanding everything correctly?
>>
>> Thoughts?
>>
>> ---------- Forwarded message ----------
>> From: Canadian Wilf
>> Date: Thu, Sep 6, 2012 at 9:52 PM
>> Subject: XML 1.1 -- Please stab me with a dull knife and trample my
dead
>> body
>> To: jdom-interest at jdom.org
>>
>>
>> Hi All,
>>
>> I just learned that in order to safely use JDOM2, I will need to
>> sanitize my Element .setText(string) so that the parsed data does not
>> contain verboten characters under the XML 1.1 spec.
>>
>> I have an ascii processor and it needs to be able to use xml as a
>> document format. Unfortunately, not all ascii is allowed in an Element
>> text.
>>
>> Stab me with a dull knife and trample my dead body. But ..... please
>> please please don't make me sanitize all my data before putting it into
>> XML
>> Elements.
>>
>> 1) It makes my programming task much more cumbersome because I must
>> ensure not to feed any of the new verboten and doomed ascii/UTF-8
>> characters to store as xml text.
>>
>> 2) No one uses xml 1.1, do they?
>>
>> 3) It slows down the parsing (a very small amount) with all the
element
>> text checking.
>>
>> Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can
>> this be undone?
>>
>> Does everyone understand that their software will bust if data
provided
>> as text is not adhering to the new standard?
>>
>> What about you? How do you deal with it when using the libraries?
>>
>> Wilf
>>
>>
>>
>> _______________________________________________
>> To control your jdom-interest
>>
membership:http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>
>>
>>
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>
From curoli at gmail.com Fri Sep 7 13:22:29 2012
From: curoli at gmail.com (Oliver Ruebenacker)
Date: Fri, 7 Sep 2012 16:22:29 -0400
Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull
knife and trample my dead body
In-Reply-To:
References:
<5049F11E.8050004@saxonica.com>
<82068eb04129387f7b45c8e8893644ab@tuis.net>
Message-ID:
Hello,
On Fri, Sep 7, 2012 at 3:17 PM, Canadian Wilf wrote:
> Let's focus on valid character data for xml. How to do this:
>
> String s = someRandomBytesNowAsString();
Java Strings are not actually random bytes. The bytes are UTF-16, if
I remember correctly.
> Element e = new Element("random")
> e.setText(s) or e.addContent(new CDATA(s))
>
> Currently this will fail.
Sorry, you lost me here. How will this fail? Will it throw an
exception? Or will it otherwise do something undesired?
Maybe I'm missing something, but it sounds to me as if you are
referring to specs that apply to XML character streams and not to JDOM
objects.
Take care
Oliver
>.. Which seems wrong because I should be able to
> send whatever data I want as text in xml content.
>
> What use is xml (1.0 or 1.1) if I cannot represent various data? Is the
> solution to make a custom escaper for my data?
>
> e.setText(encodeSpecial(s)) and decodeSpecial(e.getText())
>
> Crazy!
>
> Wilf
>
>
> On Fri, Sep 7, 2012 at 11:48 AM, Rolf Lear wrote:
>>
>>
>> Hi Wilf.
>>
>> You are getting your wires crossed..... In your mail you referenced parsed
>> and external entities. These have nothing to do with PCDATA (parsed
>> character data - regular XML text), and CDATA (unparsed character data -
>> )
>>
>> Michael was answering your question based on the 'entities', where as you
>> want the details on the 'PCDATA' and the 'CDATA'.
>>
>> So, forget about the 'entity' references, and focus on the valid character
>> data for XML.
>>
>> The only difference between CDATA (character blocks between > ]]> ) and PCDATA (element 'text'), is that the XML Parser will look for
>> '>
>> With the correct escaping, all CDATA content can be expressed as PCDATA
>> content.
>>
>> This does not help you though, because not all Java 'char' characters are
>> valid Unicode characters, and thus not all chars are valid as either CDATA
>> or PCDATA.
>>
>> In XML 1.0 this distinction was clear.
>>
>> In XML 1.1 I am not certain how to interpret the difference between
>> 'Chars' and 'RestrictedChars': http://www.w3.org/TR/xml11/#charsets
>>
>> JDOM takes a 1.0 perspective on Characters... which may be a problem, but
>> it is not going to solve your issues even if it supports 1.1 chars.
>>
>> Rolf
>>
>>
>>
>>
>> On Fri, 7 Sep 2012 08:45:33 -0700, Canadian Wilf
>> wrote:
>> > Then what is the proper mode:
>> >
>> > Element e = new Element("foo")
>> >
>> > Should I do this:
>> >
>> > e.setText(string_of_sanitized_data_with_illegal_characters_escaped);
>> >
>> > or
>> >
>> > e.setText(any_text);
>> >
>> >
>> > Wilf
>> >
>> >
>> > On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay wrote:
>> >
>> >> No, that's all wrong. The contents of an unparsed entity are always an
>> >> external resource, they are never part of a text or attribute node.
>> >> Parsed
>> >> entities do become part of the content, but they must always use the
>> XML
>> >> character set.
>> >>
>> >> Michael Kay
>> >> Saxonica
>> >>
>> >> On 07/09/2012 13:10, Canadian Wilf wrote:
>> >>
>> >> According to the xml 1.1 spec:
>> >>
>> >> 4 Physical Structures ...
>> >>> [Definition: An *unparsed entity* is a resource whose contents may or
>> >>> may not be text , and if text,
>> may
>> >>> be other than XML. Each unparsed entity has an associated
>> >>> notation,
>> >>> identified by name. Beyond a requirement that an XML processor make
>> the
>> >>> identifiers for the entity and notation available to the application,
>> >>> XML
>> >>> places no constraints on the contents of unparsed entities.]
>> >>
>> >>
>> >>
>> >> AND
>> >>
>> >> Entities may be either parsed or unparsed. [Definition: The contents
>> of
>> >>> a *parsed entity* are referred to as its replacement
>> >>> text;
>> >>> this text is considered an
>> >>> integral part of the document.]
>> >>
>> >> [Definition: An *unparsed entity* is a resource whose contents may or
>> may
>> >>> not be text , and if text, may be
>> >>> other than XML. Each unparsed entity has an associated
>> >>> notation,
>> >>> identified by name. Beyond a requirement that an XML processor make
>> the
>> >>> identifiers for the entity and notation available to the application,
>> >>> XML
>> >>> places no constraints on the contents of unparsed entities.]
>> >>> Parsed entities are invoked by name using entity references; unparsed
>> >>> entities by name, given in the value of *ENTITY* or *ENTITIES*
>> >>> attributes.
>> >>
>> >>
>> >>
>> >> In the current JDOM version, Element method setText(string) and also
>> >> addContent(CDATA) refuses text that contains illegal characters. It is
>> >> treating the data provided as 'parsed' when it should by the spec be
>> >> treating it as free content.
>> >>
>> >> I understand:
>> >>
>> >> 1) The xml 1.1 spec defines a parsed entity as its 'replacement
>> text'.
>> >>
>> >> 2) Replacement text' would refer to the actual textual makeup of a
>> >> serialized Element, not the data an Element holds in a Text content
>> >> element
>> >>
>> >>
>> >> Then, if the above is true, the current implementation is actually
>> wrong
>> >> to verify data.
>> >>
>> >> I propose that JDOM stop verifying data set as Element text and CDATA
>> >> and leave it to the xerces (or whatever) to make sure the document is
>> >> proper 1.1.
>> >>
>> >> Am I understanding everything correctly?
>> >>
>> >> Thoughts?
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: Canadian Wilf
>> >> Date: Thu, Sep 6, 2012 at 9:52 PM
>> >> Subject: XML 1.1 -- Please stab me with a dull knife and trample my
>> dead
>> >> body
>> >> To: jdom-interest at jdom.org
>> >>
>> >>
>> >> Hi All,
>> >>
>> >> I just learned that in order to safely use JDOM2, I will need to
>> >> sanitize my Element .setText(string) so that the parsed data does not
>> >> contain verboten characters under the XML 1.1 spec.
>> >>
>> >> I have an ascii processor and it needs to be able to use xml as a
>> >> document format. Unfortunately, not all ascii is allowed in an Element
>> >> text.
>> >>
>> >> Stab me with a dull knife and trample my dead body. But ..... please
>> >> please please don't make me sanitize all my data before putting it into
>> >> XML
>> >> Elements.
>> >>
>> >> 1) It makes my programming task much more cumbersome because I must
>> >> ensure not to feed any of the new verboten and doomed ascii/UTF-8
>> >> characters to store as xml text.
>> >>
>> >> 2) No one uses xml 1.1, do they?
>> >>
>> >> 3) It slows down the parsing (a very small amount) with all the
>> element
>> >> text checking.
>> >>
>> >> Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can
>> >> this be undone?
>> >>
>> >> Does everyone understand that their software will bust if data
>> provided
>> >> as text is not adhering to the new standard?
>> >>
>> >> What about you? How do you deal with it when using the libraries?
>> >>
>> >> Wilf
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> To control your jdom-interest
>> >>
>>
>> membership:http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> To control your jdom-interest membership:
>> >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>> >>
>
>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
--
Scientific Developer at PanGenX (http://www.pangenx.com)
"Stagnation and the search for truth are always opposites." - Nadezhda
Tolokonnikova
From bjorn at xowave.com Fri Sep 7 15:27:28 2012
From: bjorn at xowave.com (Bjorn Roche)
Date: Fri, 7 Sep 2012 18:27:28 -0400
Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull
knife and trample my dead body
In-Reply-To:
References:
<5049F11E.8050004@saxonica.com>
<82068eb04129387f7b45c8e8893644ab@tuis.net>
Message-ID: <7544681B-675D-481C-8FB9-017BC9FEC5CC@xowave.com>
On Sep 7, 2012, at 4:43 PM, Canadian Wilf wrote:
> I can do this:
>
> String random = new String(someRandomByte[])
Let me address this by pointing out a degenerate case. Strings in java are terminated by the null char (er, I think. Wow, it's been a while since I learned this insanely basic thing). If your someRandomBytes contains two consecutive zero bytes (= a single zero char), then the string "random" will obviously not be what you wanted, because it will end early -- if you are lucky. Another example is if the "someRandomByte" ends in the first half of a unicode codepoint. What happens then? So, yes you can construct a string from a byte array like you did here but please don't! RTFM: "The behavior of this constructor when the given bytes are not valid in the default charset is unspecified." Unspecified. As in "it might delete your hard drive, log on to facebook and unfriend your wife." That's what unspecified means, so those bytes need to be "sanitized" too.
If that's the kind of data you want to put in XML (raw, random-assed binary), use Base64!
> However, the string cannot be passed to the Text of an XML Element since it may contain illegal characters (<= 0X20 ascii, vertical tab, etc.) This will fail:
>
> new Element("test").setText(random)
>
> XOM and JDOM both restrict the access and will throw IllegalDataException if one of the characters (0x--0xFFFF) is not in XML Unicode specs.
First off, I think maybe you should read this because we are not talking about 0x0 to 0xFFFF: http://www.joelonsoftware.com/articles/Unicode.html
Secondly, yes there are values that must be escaped in XML. For example < and > for obvious reasons, but the library does this for you. Then there are values you can't put into XML at all. These fall into other categories. "not valid in a string" (eg the NULL character usually used as a string terminator) is one. Yes, that's right, you can't put 0x00 in an XML string, 'cause you can't put it in a string! OMG! Stop the presses! I also find this annoying, and have been bitten by it (I think it was 0x17 or something), but that's life.
I agree, however, it would be nice to have some clarity on exactly what's allowed.
When in doubt, use Base64!
Or create sub elements for the weird chars, just like html does for, say, newlines:
bjorn
-----------------------------
Bjorn Roche
http://www.xonami.com
Audio Collaboration
http://blog.bjornroche.com
From jdom at tuis.net Fri Sep 7 16:29:15 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 07 Sep 2012 19:29:15 -0400
Subject: [jdom-interest] Fwd: XML 1.1 -- Please stab me with a dull
knife and trample my dead body
In-Reply-To: <82068eb04129387f7b45c8e8893644ab@tuis.net>
References:
<5049F11E.8050004@saxonica.com>
<82068eb04129387f7b45c8e8893644ab@tuis.net>
Message-ID: <504A834B.5050706@tuis.net>
So, I have been studying up on the Chars and RestrictedChars in the
XML1.1 spec.
My personal feeling is that the RestrictedChars mechanism for specifying
the document format is somewhat complicated, but I now believe I have
'grokked' it. It all boils down to these four constraints:
1. There are two sets of Characters defined for XML:
Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] |
[#x86-#x9F]
RestrictedChar is a subset of Char
2. a valid XML *unparsed* document is defined as:
document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
3. prolog, element, and Misc are all (indirectly) constrained to 'Char'
based characters.
4. Character and entity references must resolve to data from the 'Char'
set... http://www.w3.org/TR/xml11/#sec-references
Based on the four statements above it is apparent that a valid document
consists of a prolog (which may be empty), an element (which must
exist), and followed by optional comments, PI's and whitespace. Further,
there are not allowed to be any restricted chars in the *unparsed*
document anywhere.
But, a big difference between XML 1.0 and 1.1 is that the Char dataset
for 1.1 is larger than 1.0 (it includes [#x1-#xD7FF] instead of 'just'
#x9 | #xA | #xD | [#x20-#xD7FF] )
So, XML 1.1 includes all the low-value control characters.... but, it
*Restricts* them from appearing *raw* in the unparsed document. It goes
even further, and it also restricts the following chars in the
*unparsed* document: [#x7F-#x84] | [#x86-#x9F].
In XML 1.1 though, you can use a char reference to display these
restricted chars like
Unfortunately for you, Wilf, XML 1.1 still makes the following Java char
values illegal as XML characters: 0x0000, 0xD800-0xDFFF, and 0xFFFF
JDOM 2.x follows JDOM 1.x and allows the set of characters defined for
XML 1.0.
This is likely a problem. Unfortunately, it is not easily possible for
JDOM to 'infer' whether it is working with an XML 1.0 or 1.1 document.
Perhaps this needs some thought.
Rolf
On 07/09/2012 2:48 PM, Rolf Lear wrote:
>
> Hi Wilf.
>
> You are getting your wires crossed..... In your mail you referenced parsed
> and external entities. These have nothing to do with PCDATA (parsed
> character data - regular XML text), and CDATA (unparsed character data -
> )
>
> Michael was answering your question based on the 'entities', where as you
> want the details on the 'PCDATA' and the 'CDATA'.
>
> So, forget about the 'entity' references, and focus on the valid character
> data for XML.
>
> The only difference between CDATA (character blocks between ]]> ) and PCDATA (element 'text'), is that the XML Parser will look for
> '
> With the correct escaping, all CDATA content can be expressed as PCDATA
> content.
>
> This does not help you though, because not all Java 'char' characters are
> valid Unicode characters, and thus not all chars are valid as either CDATA
> or PCDATA.
>
> In XML 1.0 this distinction was clear.
>
> In XML 1.1 I am not certain how to interpret the difference between
> 'Chars' and 'RestrictedChars': http://www.w3.org/TR/xml11/#charsets
>
> JDOM takes a 1.0 perspective on Characters... which may be a problem, but
> it is not going to solve your issues even if it supports 1.1 chars.
>
> Rolf
>
>
>
>
> On Fri, 7 Sep 2012 08:45:33 -0700, Canadian Wilf
> wrote:
>> Then what is the proper mode:
>>
>> Element e = new Element("foo")
>>
>> Should I do this:
>>
>> e.setText(string_of_sanitized_data_with_illegal_characters_escaped);
>>
>> or
>>
>> e.setText(any_text);
>>
>>
>> Wilf
>>
>>
>> On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay wrote:
>>
>>> No, that's all wrong. The contents of an unparsed entity are always an
>>> external resource, they are never part of a text or attribute node.
>>> Parsed
>>> entities do become part of the content, but they must always use the
> XML
>>> character set.
>>>
>>> Michael Kay
>>> Saxonica
>>>
>>> On 07/09/2012 13:10, Canadian Wilf wrote:
>>>
>>> According to the xml 1.1 spec:
>>>
>>> 4 Physical Structures ...
>>>> [Definition: An *unparsed entity* is a resource whose contents may or
>>>> may not be text , and if text,
> may
>>>> be other than XML. Each unparsed entity has an associated
>>>> notation,
>>>> identified by name. Beyond a requirement that an XML processor make
> the
>>>> identifiers for the entity and notation available to the application,
>>>> XML
>>>> places no constraints on the contents of unparsed entities.]
>>>
>>>
>>>
>>> AND
>>>
>>> Entities may be either parsed or unparsed. [Definition: The contents
> of
>>>> a *parsed entity* are referred to as its replacement
>>>> text;
>>>> this text is considered an
>>>> integral part of the document.]
>>>
>>> [Definition: An *unparsed entity* is a resource whose contents may or
> may
>>>> not be text , and if text, may be
>>>> other than XML. Each unparsed entity has an associated
>>>> notation,
>>>> identified by name. Beyond a requirement that an XML processor make
> the
>>>> identifiers for the entity and notation available to the application,
>>>> XML
>>>> places no constraints on the contents of unparsed entities.]
>>>> Parsed entities are invoked by name using entity references; unparsed
>>>> entities by name, given in the value of *ENTITY* or *ENTITIES*
>>>> attributes.
>>>
>>>
>>>
>>> In the current JDOM version, Element method setText(string) and also
>>> addContent(CDATA) refuses text that contains illegal characters. It is
>>> treating the data provided as 'parsed' when it should by the spec be
>>> treating it as free content.
>>>
>>> I understand:
>>>
>>> 1) The xml 1.1 spec defines a parsed entity as its 'replacement
> text'.
>>>
>>> 2) Replacement text' would refer to the actual textual makeup of a
>>> serialized Element, not the data an Element holds in a Text content
>>> element
>>>
>>>
>>> Then, if the above is true, the current implementation is actually
> wrong
>>> to verify data.
>>>
>>> I propose that JDOM stop verifying data set as Element text and CDATA
>>> and leave it to the xerces (or whatever) to make sure the document is
>>> proper 1.1.
>>>
>>> Am I understanding everything correctly?
>>>
>>> Thoughts?
>>>
>>> ---------- Forwarded message ----------
>>> From: Canadian Wilf
>>> Date: Thu, Sep 6, 2012 at 9:52 PM
>>> Subject: XML 1.1 -- Please stab me with a dull knife and trample my
> dead
>>> body
>>> To: jdom-interest at jdom.org
>>>
>>>
>>> Hi All,
>>>
>>> I just learned that in order to safely use JDOM2, I will need to
>>> sanitize my Element .setText(string) so that the parsed data does not
>>> contain verboten characters under the XML 1.1 spec.
>>>
>>> I have an ascii processor and it needs to be able to use xml as a
>>> document format. Unfortunately, not all ascii is allowed in an Element
>>> text.
>>>
>>> Stab me with a dull knife and trample my dead body. But ..... please
>>> please please don't make me sanitize all my data before putting it into
>>> XML
>>> Elements.
>>>
>>> 1) It makes my programming task much more cumbersome because I must
>>> ensure not to feed any of the new verboten and doomed ascii/UTF-8
>>> characters to store as xml text.
>>>
>>> 2) No one uses xml 1.1, do they?
>>>
>>> 3) It slows down the parsing (a very small amount) with all the
> element
>>> text checking.
>>>
>>> Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can
>>> this be undone?
>>>
>>> Does everyone understand that their software will bust if data
> provided
>>> as text is not adhering to the new standard?
>>>
>>> What about you? How do you deal with it when using the libraries?
>>>
>>> Wilf
>>>
>>>
>>>
>>> _______________________________________________
>>> To control your jdom-interest
>>>
> membership:http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>>
>>>
>>>
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>
From jdom at tuis.net Tue Sep 11 03:59:29 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 11 Sep 2012 06:59:29 -0400
Subject: [jdom-interest] JDOM 2.0.3 released - special note for Maven users
Message-ID: <504F1991.1050807@tuis.net>
Hi all.
JDOM 2.0.3 is now available from the regular locations, unless you are a
maven user, in which case, it is not the normal location! See the maven
notes at the end....
The changes for 2.0.3 are as follows:
Bugs:
Fixes Issue 88 - makes subclasses of JDOM content serializable even
if they are not in the org.jdom2 package.
Fixes Issue 90 - fixes a false-positive check for Attributes. See
the issue for the details.
Features:
Fixes Issue 89 - extends the SAX processing in JDOM to allow
specific (named) JAXP factories to be used
Fixes Issue 91 - A performance improvement for AttributeList
Fixes Issue 92 - Performance improvements for Verifier
No issue, but includes performance improvements to the regular
ContentList.
Procedural:
Resolves Issue 87 - The name for the JDOM artifact in
maven-central: JDOM 2.x from now on will be released in to the jdom2
artifact, instead of the jdom artifact.
Please download the package from:
https://github.com/downloads/hunterhacker/jdom/jdom-2.0.3.zip
Maven Users
===========
If you use maven to access your JDOM resources, please note that this
release was not made to the jdom artifact, but to the jdom2 artifact.
This has all sorts of implications, but, I am assured, that this is the
best way to reduce the headaches that were created when jdom 2.0.0 was
first released.
Please see the notes on issue #87 to understand the reasons for why this
decision was made...
https://github.com/hunterhacker/jdom/issues/87
In future, maven users should only reference JDOM 1.x versions from the
jdom artifact, and all JDOM 2.x versions should be referenced from the
jdom2 artifact.
Happy Coding
Rolf
From jdom at tuis.net Wed Sep 12 06:48:05 2012
From: jdom at tuis.net (Rolf Lear)
Date: Wed, 12 Sep 2012 09:48:05 -0400
Subject: [jdom-interest] Pending fix for issue #93
Message-ID:
Hi all.
Recently issue #93 was filed (this morning), and I have a fix out for it
already....
https://github.com/hunterhacker/jdom/issues/93
This issue relates to using JDOM in a security-constrained environment (in
this case, an applet). The actual issue is that some of the JDOM code
references System.getProperties(), and some properties are not accessible
from Applets.
This issue is contained within a very limited scope of JDOM usage, so it
should have no impact on regular JDOM users.
Still, you should probably be aware of it.
The issue has been fixed, and there is a hotfix package of JDOM with the
fix available on the github download site.
I will be scheduling a formal release of JDOM 2.0.4 for the October
timeframe unless something else comes up before that.
Thanks all
Rolf
From jdom at tuis.net Wed Sep 12 15:35:16 2012
From: jdom at tuis.net (Rolf Lear)
Date: Wed, 12 Sep 2012 18:35:16 -0400
Subject: [jdom-interest] HotFix packages on GitHub
Message-ID: <50510E24.2000207@tuis.net>
Hi all.
During the 2.x process I have uploaded a number of files to GitHub here:
https://github.com/hunterhacker/jdom/downloads
There are 'real' packages (2.0.0, 2.0.1, 2.0.2, and 2.0.3) as well as
real support files (jdom2-dev-jars.zip).
There are also a lot of 'low value' files, like the early beta versions
of 2.x, and the various *issue*.zip interim fix packages.
I intend to remove all except the 'current' issue packages, and I intend
to remove the BETA packages. This would leave just the important stuff
behind.
Can anyone think of any reason to keep these 'low value' files? They are
just taking up space..... aren't they?
Unless I hear otherwise, I will remove the cruft this coming weekend....
Rolf
From mike at saxonica.com Thu Sep 13 00:08:01 2012
From: mike at saxonica.com (Michael Kay)
Date: Thu, 13 Sep 2012 08:08:01 +0100
Subject: [jdom-interest] Performance measurements with Saxon
Message-ID: <50518651.4020309@saxonica.com>
JDOM2 is now working as an external object model for Saxon.
We've done some performance measurements which are summarised here:
http://dev.saxonica.com/blog/mike/2012/09/index.html#000194
These figures show that of all the external object models, JDOM2 now
comes second (to XOM) in the league. The Saxon driver for XOM is
probably the most carefully tuned of all the drivers, which may have
something to do with it; also, I believe that XOM added features
explicitly for Saxon's use, to make sorting of nodes into document order
more efficient.
A more detailed breakdown of the results for JDOM1 and JDOM2 is given
below. The first group of results are for JDOM1, the second group for
JDOM2. For each query in the XMark benchmark, they show the execution
time in seconds running against a 1Mb source document; the driver
executes each query repeatedly until 1000 iterations or 30 seconds have
elapsed.
There's a consistent speed-up between JDOM and JDOM2. In the cases where
the speed-up is greatest, however, this is in part because of
improvements in the Saxon "wrapper": instead of using our own
general-purpose implementation of the descendant axis, we now make use
of Parent.getDescendants().
In this measurements, JDOM2 has slightly lower memory requirements but
slightly higher tree-building time; but I wouldn't be 100% confident
that either figure is consistent.
Our intention is to release Saxon 9.5 (when it's ready) with support for
both JDOM and JDOM2.
Michael Kay
Saxonica
From jdom at tuis.net Thu Sep 13 06:19:40 2012
From: jdom at tuis.net (Rolf Lear)
Date: Thu, 13 Sep 2012 09:19:40 -0400
Subject: [jdom-interest] Performance measurements with Saxon
In-Reply-To: <50518651.4020309@saxonica.com>
References: <50518651.4020309@saxonica.com>
Message-ID: <9314207044bfed1bb22c2b905b38f336@tuis.net>
Hi Michael.
I look at those results and I am really pleased that JDOM 2.x is so much
faster than JDOM 1.x on the query time (twice as fast as JDOM 1.x).
There were a number of areas in JDOM 2.x that I focused on, memory
footprint, iterator performance, and parse time. It is really good to see
that the memory and iterator improvements are reflected in your
'independent' tests.
Of course, it's also instinctive to be competitive.... and, in that light,
I have to ask:
- is it possible you can point me to the code you are using for the test
(especially the 'wrapper layers' so I can inspect that code, and perhaps
have a 'second opinion' to see whether the wrapper has room for
improvement, and also whether JDOM can accommodate the Saxon logic more
efficiently... I am willing (eager) to spend some time ensuring that the
combination of JDOM and Saxon is as good as possible.
- can you give an indication of what the baseline time is for the TinyTree
query process? The ratios are good to compare one model against the other,
but, creating the JDOM model takes 110ms less than XOM, and if the queries
are taking just a few ms, then it stands to reason that JDOM2 outperforms
XOM substantially for cases where. For example, if the Query takes 5ms,
then JDOM can query the document 22 times in the time it takes XOM to query
it once....
Finally, I already have a scheduled release for JDOM 2.0.4 for early
October. If it is possible to 'link up' with your Saxon team I think it is
worth working together so that I can have an even better combination of
JDOM 2.x and Saxon for release 9.5 of Saxon.... would that be possible? It
would also be great to get some feedback on the JDOM 2.x apis and whether
the changes have made it easier (or harder) to integrate with Saxon.... a
'debriefing' would be nice.
Thanks for the feedack on the performance though, it's great to see
something independent.
Rolf
On Thu, 13 Sep 2012 08:08:01 +0100, Michael Kay wrote:
> JDOM2 is now working as an external object model for Saxon.
>
> We've done some performance measurements which are summarised here:
>
> http://dev.saxonica.com/blog/mike/2012/09/index.html#000194
>
> These figures show that of all the external object models, JDOM2 now
> comes second (to XOM) in the league. The Saxon driver for XOM is
> probably the most carefully tuned of all the drivers, which may have
> something to do with it; also, I believe that XOM added features
> explicitly for Saxon's use, to make sorting of nodes into document order
> more efficient.
>
> A more detailed breakdown of the results for JDOM1 and JDOM2 is given
> below. The first group of results are for JDOM1, the second group for
> JDOM2. For each query in the XMark benchmark, they show the execution
> time in seconds running against a 1Mb source document; the driver
> executes each query repeatedly until 1000 iterations or 30 seconds have
> elapsed.
>
> There's a consistent speed-up between JDOM and JDOM2. In the cases where
> the speed-up is greatest, however, this is in part because of
> improvements in the Saxon "wrapper": instead of using our own
> general-purpose implementation of the descendant axis, we now make use
> of Parent.getDescendants().
>
> In this measurements, JDOM2 has slightly lower memory requirements but
> slightly higher tree-building time; but I wouldn't be 100% confident
> that either figure is consistent.
>
> Our intention is to release Saxon 9.5 (when it's ready) with support for
> both JDOM and JDOM2.
>
> Michael Kay
> Saxonica
>
From mike at saxonica.com Thu Sep 13 07:28:12 2012
From: mike at saxonica.com (Michael Kay)
Date: Thu, 13 Sep 2012 15:28:12 +0100
Subject: [jdom-interest] Performance measurements with Saxon
In-Reply-To: <9314207044bfed1bb22c2b905b38f336@tuis.net>
References: <50518651.4020309@saxonica.com>
<9314207044bfed1bb22c2b905b38f336@tuis.net>
Message-ID: <5051ED7C.90901@saxonica.com>
O'Neil is working on some refactoring of the wrapper code at the moment,
he'll send you a copy when it's stable. We're trying to reduce
proliferation so that improvements to algorithms only need to be made once.
Generally these queries run far faster than the tree construction time.
In the table I posted, "build-time" is the time to build the model in ms
(say 177ms) and "avg" is the time to run the query in ms (0.04ms for the
simplest queries, about 30ms for the most expensive). So you are right
that if the model has to be built in order to run a single query or
transformation, the build time can be more important than the query
time. This is of course the scenario where lazy construction ought to
play a role.
(Most of the XMark queries are linear with document size assuming the
Saxon-EE optimizer is available; if I remember right only one is
quadratic. Of course with non-linear queries, the query time quickly
overtakes the build time as the document size grows.)
In this test we wanted to test our own builders, so we are building the
tree programmatically rather than just invoking the parser; we haven't
tested how this build time compares with the "native" build using the
parser. The only case for using JDOM with Saxon in preference to using
the TinyTree is where the model is built programmatically by a previous
step in the processing pipeline, so this isn't an unreasonable thing to do.
Michael Kay
Saxonica
On 13/09/2012 14:19, Rolf Lear wrote:
> Hi Michael.
>
> I look at those results and I am really pleased that JDOM 2.x is so much
> faster than JDOM 1.x on the query time (twice as fast as JDOM 1.x).
>
> There were a number of areas in JDOM 2.x that I focused on, memory
> footprint, iterator performance, and parse time. It is really good to see
> that the memory and iterator improvements are reflected in your
> 'independent' tests.
>
> Of course, it's also instinctive to be competitive.... and, in that light,
> I have to ask:
>
> - is it possible you can point me to the code you are using for the test
> (especially the 'wrapper layers' so I can inspect that code, and perhaps
> have a 'second opinion' to see whether the wrapper has room for
> improvement, and also whether JDOM can accommodate the Saxon logic more
> efficiently... I am willing (eager) to spend some time ensuring that the
> combination of JDOM and Saxon is as good as possible.
>
> - can you give an indication of what the baseline time is for the TinyTree
> query process? The ratios are good to compare one model against the other,
> but, creating the JDOM model takes 110ms less than XOM, and if the queries
> are taking just a few ms, then it stands to reason that JDOM2 outperforms
> XOM substantially for cases where. For example, if the Query takes 5ms,
> then JDOM can query the document 22 times in the time it takes XOM to query
> it once....
>
>
> Finally, I already have a scheduled release for JDOM 2.0.4 for early
> October. If it is possible to 'link up' with your Saxon team I think it is
> worth working together so that I can have an even better combination of
> JDOM 2.x and Saxon for release 9.5 of Saxon.... would that be possible? It
> would also be great to get some feedback on the JDOM 2.x apis and whether
> the changes have made it easier (or harder) to integrate with Saxon.... a
> 'debriefing' would be nice.
>
> Thanks for the feedack on the performance though, it's great to see
> something independent.
>
> Rolf
>
> On Thu, 13 Sep 2012 08:08:01 +0100, Michael Kay wrote:
>> JDOM2 is now working as an external object model for Saxon.
>>
>> We've done some performance measurements which are summarised here:
>>
>> http://dev.saxonica.com/blog/mike/2012/09/index.html#000194
>>
>> These figures show that of all the external object models, JDOM2 now
>> comes second (to XOM) in the league. The Saxon driver for XOM is
>> probably the most carefully tuned of all the drivers, which may have
>> something to do with it; also, I believe that XOM added features
>> explicitly for Saxon's use, to make sorting of nodes into document order
>> more efficient.
>>
>> A more detailed breakdown of the results for JDOM1 and JDOM2 is given
>> below. The first group of results are for JDOM1, the second group for
>> JDOM2. For each query in the XMark benchmark, they show the execution
>> time in seconds running against a 1Mb source document; the driver
>> executes each query repeatedly until 1000 iterations or 30 seconds have
>> elapsed.
>>
>> There's a consistent speed-up between JDOM and JDOM2. In the cases where
>> the speed-up is greatest, however, this is in part because of
>> improvements in the Saxon "wrapper": instead of using our own
>> general-purpose implementation of the descendant axis, we now make use
>> of Parent.getDescendants().
>>
>> In this measurements, JDOM2 has slightly lower memory requirements but
>> slightly higher tree-building time; but I wouldn't be 100% confident
>> that either figure is consistent.
>>
>> Our intention is to release Saxon 9.5 (when it's ready) with support for
>> both JDOM and JDOM2.
>>
>> Michael Kay
>> Saxonica
>>
>
>
From jdom at tuis.net Sat Sep 15 10:58:18 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 15 Sep 2012 13:58:18 -0400
Subject: [jdom-interest] Enhance jdom by OSGi support
In-Reply-To: <50543246.9090806@gmx.net>
References: <50543246.9090806@gmx.net>
Message-ID: <5054C1BA.8080907@tuis.net>
Hi Benjamin.
An early issue was created in the JDOM 2.x process:
https://github.com/hunterhacker/jdom/issues/6
This has been resolved, and JDOM 2.x has no classes/files in anything
other than the org.jdom2.* namespace.
This should make it easier to make a bundle from JDOM 2.x
That's the good news.
The bad news is that I know nothing about OSGi. I have no idea of what
it takes to support that model.
You mention Maven in your mail. At the moment the 'maven' word is a
swear word in my home.... It's not likely (while I am maintaining JDOM)
for the code base to be converted to a maven build process. There are
some real reasons, and some emotional reasons, but fundamentally I
regret having committed to producing a JDOM artifact on maven-central.
If a maven build process for JDOM is a requirement of OSGi support then
it is a 'no-go' for me.
If maven is not required for creating a suitable OSGi system, then I
will consider putting in effort to make it work on the following
'conditions':
- there is some distinct reason why it is better for 'jdom' to create
the bundle rather than some third-party (as you have already pointed
out, other people seem to be making OSGi bundles for JDOM already...)
- there is an OSGi expert who has 'round-trip' experience in making OSGi
bundles who can take responsibility for the JDOM OSGi bundle
(responsibility for either 'doing it', or alternatively being a 'mentor'
for someone who 'does it', and then 'validates' the result). The expert
also has to be available for some time to answer any issues that may
come up.
- there is no need for maven in the JDOM build
- there is no need to change any signatures of the JDOM API
- there is a relatively easy system for testing the bundle to ensure it
works.
I have learned (from the maven-central artifact for JDOM) that there are
issues when trying to support some protocol/system that you don't
understand.
I do no know OSGi. I do not use it. I do not know its benefits even. I
am not equipped to produce it. I cannot even learn enough about it to
get to the point where I am expert enough to do it properly.
There needs to be a committed OSGi expert involved.
Rolf
On 15/09/2012 3:46 AM, Benjamin Graf wrote:
> Hi,
>
> is OSGi support still of any interest? Maybe have a look on
> https://github.com/apache/servicemix4-bundles/tree/trunk/jdom-2.0.2 to
> get the right manifest entries for jdom2. It might be useful to switch
> the whole project to maven at give it standardized structure and let all
> the magic been done by plugins (package type bundle)
>
> Any comments?
>
> Greets
> Benjamin
>
>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>