Does "one being a subset of another" mean that code in the first is
also syntactically correct and semantically the same as in the
second?

As in the sense of elementary set theory,

are HTML, XML and XHTML all different subsets of SGML?

do XML and HTML almost not intersect each other?

is XHTML a superset of both XML and HTML?

Can I expect some more concise and clear summation of the
differences in the purposes of the four and/or when to use which,
than the link above? I am really confused about the clear line between their intended purposes.

XML is not a single Markup Language. It is a metalanguage to let users design their own markup language.

I was wondering how to understand XML and HTML are both subsets of
SGML, but HTML is a markup language while XML is not a markup
language but a metalanguage for designing markup languages?

Are SGML and XHTML both also metalanguage for designing markup
language?

As in both links mention that HTML is an applicaiton of SGML as well as a subset of SGML, and XHTML is an application of XML. I wonder what differences are between saying one language is an application
of another, and one language is a subset of another?

3 Answers
3

HTML and XML are both markup languages (hence the *ML). XML is a generic markup language suitable for representing arbitrary data, while HTML is a specific markup language suitable only for representing web pages.

HTML and XHTML are both subsets only of SGML, except that XHTML has additional specifications so that it also validates as XML. Think of XML as XHTML's influential godfather.

Because of this relationship to SGML across all 3 of these languages, there are a lot of similarities, but they are all considered different languages. However, much of what defines these languages is their restrictions on SGML.

HTML restricts SGML by defining a list of tags that are allowed to be used.

XML restricts SGML by not allowing unclosed or empty start and end tags, and forces attributes to be explicit. XML also has a large number of additional restrictions that are not found in SGML.

XHTML restricts SGML with the tags from HTML (with some exclusions, such as frameset, et al), and with the tag and entity restrictions from XML.

XML is not a metalanguage for defining markup languages. Really that's just SGML. XML is simply a data formatting markup language. Your quoted source is using technical terms imprecisely, which is why they are confusing.

Purposes

XML is for defining your own data format. If you wish to pass data between two systems, XML is often the way to do it.

If, for example, you needed to pass a sales order from your website to your billing system, you could create this XML payload:

Your website would then send that XML to your billing system, which could then parse the data from that XML.

XHTML and HTML are obviously just for web pages. XHTML's primary purpose is to remove a lot of the ambiguity that we had in previous years (decades) of web development. Back in the late 90s when I started, we were using HTML 3.2 which allowed for seriously sloppy code. HTML 4+ and XHTML try to remedy that by either strongly suggesting or enforcing explicit closing tags, explicit attributes, and disallowed tags, which makes it easier on both browsers and humans, and avoids unexpected differences in behaviour cross-browser.

Thanks! (1) Are both HTML and XML subsets of XHTML? (2) Is it correct that neither HTML is a subset of XML, nor XML is a subset of HTML? Do HTML and XML have nonempty intersection, or totally separated from each other?
–
TimJul 16 '11 at 10:25

(3) What differences are between saying one language is an application of another, and one language is a subset of another?
–
TimJul 16 '11 at 10:40

There are documents that conform with both XML and HTML; there are documents that conform with XML and not HTML, and there are documents that conform with HTML and not XML. So neither is a subset of the other, but they have a non-empty intersection.
–
Michael KayJul 16 '11 at 12:11

@Tim: (1) HTML, XML, and XHTML are not subsets of anything except SGML. They are all different. XML actually has just about nothing to do with HTML or XHTML...it serves a different purpose. XHTML can be parsed as both HTML and XML, but it's used only by browsers as HTML markup. HTML and XML both have a common ancestor of SGML, but are otherwise unrelated. For every intent, they are separate because SGML is so generic.
–
JordanJul 16 '11 at 17:27

Honestly I think you're diving too deeply into terminology with application vs subset. I don't think there's a distinction between those terms, or if there is, I doubt it's widely agreed on. Suffice it to say that XHTML borrows concepts from XML and is used as a strict subset of HTML. HTML came first. XHTML came afterwards.
–
JordanJul 16 '11 at 17:29

Thanks! (1) I was wondering how to understand the two seemingly conflicting facts: XML and HTML are both subsets of SGML, and HTML is a markup language while XML is not a markup language but a metalanguage for designing markup languages? (2) According to your reply, XHTML is a subset of XML. XHTML is a superset of HTML as "XHTML subsets HTML" quoted from one link in my post. So HTML is a subset of XML? I am not sure it is true.
–
TimJul 16 '11 at 2:46

HTML breaks too many rules to be XML. HTML is closer to SGML I believe. HTML is loose with tags and there is a set number of different tag types. XHTML just the XML version of HTML.
–
WalterJ89Jul 16 '11 at 8:04

Thanks! As in both links mention that HTML is an applicaiton of SGML as well as a subset of SGML, and XHTML is an application of XML. I wonder what differences are between saying one language is an application of another, and one language is a subset of another?
–
TimJul 16 '11 at 10:42

Generally in the standards world, a "profile" of a standard is a selection of options that the standard offers: for example, if the standard allows documents to be encoded in UTF-8 or UTF-16, a profile of the standard might require them to be encoded in UTF-8. The term "subset" has a very similar meaning; though arguably the term "profile" is a little bit wider.

Thanks! (1) How about the meaning of and difference between "application", "subset" and "profile", as in Part 5 of my questions? (2) In "XHTML is the basis for a family of future document types that extend and subset HTML", does it mean XHTML is a subset of HTML or HTML is a subset of XHTML?
–
TimJul 16 '11 at 12:19