DTDs and XML: another "not well formed" question

I'm new to parsing/using xml but this project seemed reasonable to cut
my teeth on. I have a few dozen "articles" that are local
announcements of interest for my group's customers. They have a
simple format of a title, zero or more static or hyper-linked images
and one or more paragraphs of text.

A "Title" will hold plain text. The "Text"s will hold plain or mixed
content. The "Image"s will need to know about the hyper-link URL (if
any); the image source URL and possibly "height" and "width"
attributes.

Now, I have spent time searching this group and a couple others
related to the scripting language and the XML parser i am using. I
*know* what my problem is... what i don't know is why I have it.

My XML parser chokes on the first "&" (ampersand) in the "link"
attribute of the "image" tag. I know that being "well-formed" means
the amps should be "quoted" but I thought that the "CDATA bits in the
DTD meant that *ALL* characters are accepted in this context.

Is my DTD wrong for the xml I have? Is my parser/validator not
picking up on the DTD?

I know that I can pre-process the incoming xml file and change the
amps to the html entity version but that feels wastefull if CDATA is
doing what i thought it should do.

All your ampersands need to be replaced with &amp;
> src="/images/apl_cover-130.jpg" />
> <text> The cover for the <a href="http://scitation.aip.org/dbt/
>dbt.jsp?KEY=APPLAB&Volume=90&Issue=21">May

Your dtd doesn't say anything about <text> being allowed to contain
<a> elements. You'll need to change the declaration of text and add
a declaration for a.
>My XML parser chokes on the first "&" (ampersand) in the "link"
>attribute of the "image" tag. I know that being "well-formed" means
>the amps should be "quoted" but I thought that the "CDATA bits in the
>DTD meant that *ALL* characters are accepted in this context.

A CDATA marked section in text, such as <![CDATA[hello & goodbye]]>
has that effect. In attributes (whether declared as CDATA or something
else) you have to quote ampersands. There's no way around it.

The CDATA/PCDATA terminology is certainly confusing.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Advertisements

> Are src, link, width and height subelements or attributes?
> You've listed them as child elements in the content model, and
> then declared them as attributes.

oh man, sorry. I've been spinning on this for a long time now and
have been making changes left and right to the dtd and xml file. I
pasted in an incorrect version. My intent is for src, link etc to be
attributes.

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!