What is XML and how does it relate to XHTML
I had a reader of my site, Zach, ask for a better explaination of XML and XHTML. I thought about it a lot since he posted that comment and I decided that it was a topic that I thought I could tackle.
The effectiveness of what I wrote up is debatable, but I think that overall it’s pretty nice. Its loosely structured and kinda follows a line of thought but you’d be hard pressed to find a beginning, middle, and end ;) If you find it useful then I’m profoundly happy. If you end up being more confused than when you started then let me know and I’ll do my best to figure out how to fix it up. So click on the “Read On” link to read the whole thing.
In a sentence, XML is a set of rules that describe a consistent way to format data. Nothing more, nothing less. The most famous file format to build atop this is XHTML
The idea behind XML is that with a consistent set of formatting rules and regulations, open data exchange is possible. Let me demonstrate the lowest levels of XML with some examples.
XML can represent ANYTHING, the rules are purposefully lax in order to accommodate this goal. Lets describe recipes:
Grilled Cheese Bread Cheese Butter Butter each peice of bread and place into a hot skillet until golden brown. Place a peice of cheese between the two slices of bread and heat until cheese melts.
This XML document describes a single recipe. There are a few distinguishing attributes that stand out.
- All elements are delimited by angle brackets.
- <?xml version=“1.0” ?> – This is the XML declaration. XML.com states that while it is not required, its presence explicitly identifies the document as an XML document and indicates the version of XML to which it was authored.
- Most elements in XML are wrappers. This means that there are both starting and ending tags with content inside of them. In fact my example used nothing but wrappers. These are also called elements. There are, however, elements which are “empty.” A good example of these are <br /> and <hr /> tags in XHTML. These elements represent page formatting elements (a break and horizontal rule respectively), and as such do not contain any real content.
- If an element is not empty it MUST have beginning and ending tags. Such as <html> and </html>.
- If an element is empty it must have a trailing forward-slash. For example: <applause />
- Attributes are name=value pairs that occur inside start tags immediately after the element name. In my example I used an attribute with the name “type” and a value of “brief”. In XML all attribute values MUST be quoted.
- Comments can be included in XML documents. Comments begin with <!— and end with —>. Comments can contain anything your heart desires except —.
Now that you’ve got a general idea of how XML documents can be structured I’ll move on to the single defining idea that makes XML so powerful. The Document Type Definition or DTD.
DTD
In my above recipe example it is pretty apparent to a human observer what my document is trying to convey. That is one of the biggest strengths of XML. With enough care and consideration, I can choose element names with enough verbosity that humans can readily parse and use an XML document for their own use.
However, XML most often used between computer applications to describe any variety of documents such as resumes, web sites, and stock quotes. If you ran across an XML document like this (from XML.com):
Goodnight, Gracie
Say Gracie.goodnight,
From a human perspective, this is a very nonsensical document. This is where DTD’s come in. DTDs are complex documents which describe how your tags should be used and what they mean.
The most popular examples would be the HTML and XHTML DTDs. The X in XHTML only signifies that XHTML documents arE XML compliant meaning they must follow all XML rules. This means that empty tags must have trailing slashes <br /> as opposed to <br>, attribute values must be quoted, and the document must be “well-formed”. There are probably a few more qualifiers but these are the most important.
What does well-formed mean?
- No attribute must appear more than once in the same start tag. This means you can’t have two “name” attributes in a tag like <form>.
- Non-empty tags must be properly nested. You must close your tags in the reverse order that you opened them. For example:
Correct: