SGML/XML Elements versus Attributes

When Should I Use Elements, and When Should I Use Attributes?

Introduction

A perennial question arising in the mind of SGML/XML DTD designers is whether to model and encode certain information using an element, or alternatively, using an attribute. For example, given some information about the 'title' of a work and the goal of encoding this information in markup, which of the two encodings is preferable, and what principles can be used to decide?

Using an element: <book><title>Perelandra</title>...</book>

Using an attribute: <book title="Perelandra">...</book>

In this case, of course, one would like to know what the designer envisions for ... in the book element type declaration.

Experienced markup-language experts offer different opinions as to whether general principles can be given for choosing attributes over elements, and if so, what principles are most useful. Most agree that it's an "implementation decision," which reveals (arguably) that SGML/XML is not an ideal language for data modelling. Many also agree that the SGML model for an attribute (a flat string) is less than ideal, reflecting an unwarranted assumption about the essential difference between "content" and "not-content" (or between "data" and "meta-data"). The answer to the element/attribute question is more interesting and complicated now that (1) some XML documents lack a DTD or other schema, and (2) we have attribute renaming (XLink) and architectural forms processing (SGML Extended Facilities, Annex A of HyTime-1997) supported by Jade.

A popular answer based upon early HTML browser implementations is this: "If you use an attribute to encode some information, a browser won't display that information. . ." Oh, really? Whose browser? Using what style language? Both SGML and XML as metalanguages are designed to be be processing-neutral; "browser/display" are processing-specific notions. Help anathematize assumptions that lead toward or acquiesce to proprietary, pre-defined application level processing semantics in SGML/XML! That is precisely how we got the HTML mess in about 1992.

XML-DEV Thread. April 2003. Started with "how much 'meaning' should be placed into an element name, and how much (if any) should be 'filled out' by attributes..."

"D.3 Which should I use in my DTD, attributes or elements?" Q/A in The XML FAQ, edited by Peter Flynn. "... A lot will depend on what you want to do with the information and which bits of it are easiest accessed by each method. A rule of thumb for conventional text documents is that if the markup were all stripped away, the bare text should still be readable and usable, even if unformatted and inconvenient. For database output, however, or other machine-generated documents like e-commerce transactions, human reading may not be meaningful, so it is perfectly possible to have documents where all the data is in attributes, and the document contains no character data in content models at all..."

[March 04, 2004] "When to Use Elements Versus Attributes. Exploring the Oldest Question in XML Design." By Uche Ogbuji (Principal Consultant, Fourthought, Inc). From IBM developerWorks (March 04, 2004). "The oldest question asked by adopters of XML is when to use elements and when to use attributes in XML design. As with most design issues, this question rarely has absolute answers, but developers have also experienced a lack of very clear guidelines to help them make this decision. In this article, Uche Ogbuji offers a set of guiding principles for what to put in elements and what to put in attributes. Several frequently pondered questions of DTD design in SGML have followed the legacy to its offshoot, XML. Regardless of what XML schema language you use, you might find yourself asking: (1) When do I use elements and when do I use attributes for presenting bits of information? (2) When do I require an order for elements, and when do I just allow arbitrary order? (3) When do I use wrapper elements around sequences of similar elements? [...] The usual answer is No single answer is right, use your best judgment. But this is not very helpful for those trying to find their feet with XML. True, even experts do not always agree on what to do in certain situations, but this is no reason not to offer basic guidelines for choosing between XML and attributes. [The author discusses general recommendations in terms of four principles]: core content, structured information, readability, element/attribute binding. None of the guidelines is meant to be absolute; use them as rules of thumb and feel free to break the rules whenever your particular needs require it..."

[March 03, 2000] "The role of attributes in context determinancy." By Martin Bryan (The SGML Centre). "One of the most commonly asked questions in the SGML/XML world relates to when you should use attributes rather than elements to store data. This paper suggests that one of the primary reasons for using attributes should be the need to control the contexts in which elements are processed." See also the HTML version. OpEd: Note (1) the recent posting of Henry S. Thompson (W3C 'XML Schema: Structures' Chief Editor; HCRC Language Technology Group, University of Edinburgh) on 'structured attributes'. "Why structured attributes? [...] There are a lot of contexts in which the first two properties of attributes are desireable [unordered, unique], but the third [unstructured] is a serious constraint. Here's a design sketch for adding structured attributes..." Many have observed that the SGML/XML notion of "Attribute" is broken at this juncture. That brokenness, in part, is what gives rise to the constant confusion about whether to use elements or attributes. Attribute values are often complex: they are elements/Objects, either "owned/contained" or referenced. They are not simply flat strings, but in the former case, we have no alternative but to model the Attribute using an SGML/XML element. Some Attributes can be modelled in XML markup as attributes and some cannot: that's the problem...

[March 19, 2002] "Elements versus Attributes." By Gunther Stuhec (SAP). 18-March-2002. 17 pages. Published as one of four papers in "UBL NDR Position Papers." By Members of the UBL Naming and Design Rules Subcommittee (NDR SC). 16-March-2002. "A common cause of confusion, or at least uncertainty, in the design of a schemas is the choice between specifying parts of the document as elements or attributes... Elements are logical units of information in a schema. They represent information objects... Attributes are atomic, referentially transparant characteristics of an object that have no identity of their own. Generally this corresponds to primitive data types (e.g., Strings, Date, etc.). Taking a more logical view, an attribute names some characteristic of an object that models part of its internal state, and is not considered an object in its own right. That is, no other objects have relationships to an attribute of an object, but rather to the object itself... Is the content to be spell-checked? [If 'yes', use an element; if 'no', use an attribute]... The following diagram illustrates a way to find out how want to be an Element or an Attributes necessary to be define it... " [In terms of the Core Components Technical Specification:] "Component Content will be represented as an Element-Value; The Supplementary Components will be represented as Attributes." Note: most characterizations about element and attribute presented in this paper and in previous treatises represent opinions about how one, arguably, ought to best model "content" in XML documents; in most cases, the judgments are arbitrary, as XML 1.0 itself does not make statements about the semantics, nor even about what will constitute "content" or "not-content" from the data modeling perspective. [cache UBL NDR papers; paper source .ZIP]

[August 26, 2002] "Validation and Boolean Operations for Attribute-Element Constraints." By Haruo Hosoya (Kyoto University) and Makoto Murata (IBM Tokyo Research Laboratory). Paper prepared for presentation at the PLAN-X Workshop on Programming Language Technologies for XML, October 3, 2002, Pittsburgh, PA, USA. 28 pages. "Algorithms for validation and boolean operations play a crucial role in developing XML processing systems involving schemas. Although much effort has previously been made for treating elements, very few studies have paid attention to attributes. This paper presents a validation and boolean algorithms for Clark's attribute-element constraints. The kernel of Clark's proposal is a uniform and symmetric mechanism for representing constraints on elements and those on attributes. Although his mechanism has a prominent expressiveness and generality among other proposals, treating this is algorithmically challenging since naive approaches easily blow up even for typical inputs. To overcome this difficulty, we have developed (1) a two-phase validation algorithm that uses what we call attribute-element automata and (2) intersection and difference algorithms that proceed by a 'divide-and-conquer' strategy... It exploits that it is often the case that we can partition given constraint formulas into orthogonal subparts. In that case, we proceed the computation with the subparts separately and then combine the results. Although this technique cannot avoid an exponential explosion in the worst case, it appears to work well for practical inputs that we have seen... We have already implemented the validation and boolean algorithms in the XDuce language. For the examples that we have tried, the performance seems quite reasonable. We plan to collect and analyze data obtained from the experiment on the algorithms in the near future..." Source: Postscript.

[March 03, 2008] "Elements or attributes?" By John Cowan. Recycled Knowledge Blog. "Here's my contribution to the "elements vs. attributes" debate..." Ed note: John provides an excellent summary, ending with "Michael Kay says: 'Beginners always ask this question. Those with a little experience express their opinions passionately. Experts tell you there is no right answer'."