HTML Versus XHTML
The argument about whether to use HTML4.01 or XHTML1 is one that comes up time and time again. Not so long ago, most people were advising the use of XHTML1 almost without question, in the belief that it’s little more than a newer implementation of HTML and therefore somehow automatically the ‘better’ option (specifically, XHTML1 is a reformulation of HTML4 as an application of XML). However, many people who once recommended using XHTML1 have since changed their minds on the topic, including some SitePoint authors who made very strong arguments as to why examples in their books should be presented in HTML 4.01 rather than XHTML 1. Others have decided to move on to using HTML5 instead, even though the W3C has not finalized this particular language’s specification. That said, it is clear now that HTML5 is where the Web is headed (XHTML2, the proposed successor to XHTML 1 & XHTML 1.1, has effectively been canned with the working group being disbanded at the end of 2009). Until such time as the HTML5 specification is solidified and has strong support across the major browsers, the safest option is to use HTML4.01 or XHTML1 (which from this point on we’ll refer to simply as HTML and XHTML without the version numbers).1
It seems that we need to clarify what HTML and XHTML are, what their differences are, and why one or the other should be used.
The first thing you should realize is that using HTML is not wrong as long as you specify that you’re using HTML with the appropriate doctype, and the HTML you use is valid for that doctype. If you want to use HTML 4.01, no one can stop you! Ignore anyone who tells you that XHTML is the only way to go, and that using HTML 4.01 is somehow backwards. That said, you should be aware of the differences between HTML and XHTML, as these may affect your choice of markup.
Main Differences Between XHTML and HTML
The following list details the main differences between XHTML and HTML. Most of them are related to syntax differences, although there are some less obvious variations that you may not be aware of:
- XHTML is more choosy than HTML—there
are some elements that absolutely must appear in the XHTML markup, but
which may be omitted if you’re using HTML 4 and earlier versions.
These elements include the
bodyelements (although why you’d want to omit any of them is a mystery to me). In addition, every element you use in XHTML must have both an opening and closing tag (for example, you’d write
<p>This is a paragraph</p>in XHTML, but
<p>This is all you needin HTML, as no end tag is required).
- For empty elements—those that hold no
content but refer to a resource of some kind, such as an
metaelement—the tag must have a trailing closing slash, like so:
<img src="moo.jpg" alt="moo"/>. Evidently, this makes XHTML a little more verbose than HTML, but not to the extent that it has an adverse effect on the page weight.
- XHTML allows us to indicate any element as being
empty—for example, an empty paragraph can be expressed as
<p/>—but this isn’t valid when the page is served as text/html. To that end, you should restrict your use of this syntax to elements that are defined to be empty in the HTML specifications.
- In XHTML, all tags must be written in lowercase. In HTML, you can use capital letters for elements, lowercase letters for attributes, or whatever convention you like!
- All attributes in XHTML must be contained in quotes
(single or double, but usually double), hence
<input type=submit name=cmdGo/>would be valid in HTML 4.01, but would be invalid in XHTML. To be valid, it would need to be
<input type="submit" name="cmdGo"/>.
- In XHTML, all attributes must be expressed in
attribute-name and attribute-value parings with quote marks
surrounding the attribute value part, like so:
- In HTML, some elements have attributes that do not
appear to require a value—for example, the
checkedattribute for checkbox
inputelements. I stressed the word “appear” because technically it’s the attribute name that’s omitted, not the value. These are known as Boolean attributes, and in HTML you could specify that a checkbox should be checked simply by typing
<input type="checkbox" name="chkNewsletter" checked>. In XHTML, though, you must supply both an attribute and value, which results in seemingly needless repetition:
<input type="checkbox" name="chkNewsletter" checked="checked">.
- In XHTML, the opening
<html>tag requires an
xmlnsattribute (XML NameSpace) as follows:
<html xmlns="http://www.w3.org/1999/xhtml">. However, strangely, if you omit it, the W3C validator doesn’t protest as it should.
- XHTML requires certain characters to appear as named
entities. For example, you can’t use the
&character: it must be expressed using an HTML entity "
- In XHTML, languages in the document must be expressed
xml:langattribute instead of
- A MIME type must be declared appropriately in the HTTP
"application/xhtml+xml"(this is the best option),
"text/xml"(which isn’t recommended). The MIME type is set as a configuration option on the server, and is usually Apache or IIS.
- DTDs don’t support the validation of mixed namespace documents very well.
- If you use XHTML and set the proper MIME type (see the
section below called Serving the Correct MIME Type), you’ll
encounter a small snag: Internet Explorer. At the time of writing,
this browser—which still holds the lion’s share of the market—is the
only one of the browsers tested for this reference that can’t handle a
document set with a MIME type of
"application/xhtml+xml". When IE encounters a page that contains this HTTP header, it doesn’t render the page on screen, but instead prompts the user to download or save the document.
- When you’re using XHTML, text encoding should be set within the XML declaration, not in the HTTP headers (although doing the latter is still allowed). A full tutorial on the thorny issues surrounding the ways of setting character encodings is available on the W3C site.
In addition to these points, there are a number of differences between the way that an XHTML document handles scripts and the way it handles style sheets, including:
- There are requirements for the way in which comments inside scripts should be handled; Lachlan Hunt covers this topic thoroughly in his blog entry, “HTML Comments in Scripts.”
document.writeln()do not work in XHTML.
innerHTMLproperty is also ignored by some user agents.
- As XHTML is case sensitive, there can be an issue with element and
attribute names in DOM methods. For example,
onSubmitare invalid, while
The list above might discourage some newcomers
from learning the XHTML syntax—it does certainly appear that, at the very
least, XHTML requires more discipline and thought than HTML! However, one
advantage of learning XHTML syntax rather than the looser HTML syntax it
is that if you stick to the rules I’ve outlined above, you’ll be creating
pages that render just as HTML would in the browser, but which also
validate as XHTML. The presence of some XHTML-specific attributes—namely
xmlns attribute in the root element, and the
xml:lang rather than
lang—does mean that you can’t simply change the
doctype of a valid
XHTML document back to an HTML doctype and have the page validate, though.
It will contain features that are not understood by, or accepted in, the
So, there’s nothing wrong with learning the HTML 4.01 syntax to begin with, and progressing to XHTML when you feel more comfortable doing so. The transition from HTML to XHTML doesn’t have to be a massive step, although some of the habits you’ll have picked up while you were marking up HTML documents will need to be unlearned to make this a successful transition. Learning HTML 4.0 is not a bad thing, and it doesn’t make you a lazy coder—it’s just different from XHTML.
Does XHTML Reduce Your Markup Toolset?
You may have heard or read that choosing
XHTML means that you can’t use certain presentational elements such as
u. But this
isn’t strictly true. You may use these elements in XHTML Transitional and
Frameset just as you could in HTML Transitional and Frameset—the
difference is that they’re not allowed in the Strict versions of these
markup languages. Hopefully that’s a myth busted!
If you do opt to use XHTML Strict (or HTML Strict), and you thereby lose these presentational elements, you’ll definitely need to rely on CSS to do the work of prettying things up; this approach also places just a little more emphasis on the use of more structurally orientated elements available in HTML and XHTML. But don’t be led to believe that XHTML is in some way more structural than HTML 4.01. You’re not going to be adding new structural features through your use of XHTML—headings, paragraphs, block quotes, and so on were all present in HTML 4.01.
Regardless of the flavor of markup you choose—HTML or XHTML—you
can easily mark up your page using a series of
span elements, style it entirely in CSS, validate it,
and still be left with a document that offers no apparent meaning or
structure about the content it contains. In short, the language is only as
good as the pair of hands responsible for crafting it, and thus XHTML
doesn’t guarantee a better end result!
Opting for HTML for Optimized Page Weights
One possible reason for using HTML 4.01 over XHTML (of any kind or level of strictness) might be that page size is a very important consideration. For example, you may be creating a page that needs to be downloaded over a restricted connection, perhaps to a mobile device of some kind. By using HTML 4.01, you’re able to reduce the markup by not using quote marks and not using closing tags where the spec indicates that they’re optional.
If you’re building your own personal web site, or you’re building a site for an organization that doesn’t have (or expect) huge amounts of traffic, the aim of achieving slightly leaner page weights probably won’t be a strong case for using HTML 4.01. However, if we’re talking about a site that receives a significant amount of traffic, the savings may well add up, so you might need to get your calculator out! For example, if your use of HTML 4.01 means that you can omit 100 bytes of characters from a given document (without those deletions having an adverse effect on the document’s presentation in the browser), and if that document receives one million hits a day, over the course of a month that saving will amount to almost 3GB of bandwidth. Now, this is just a hypothetical scenario, and this is but one page in a web site, but depending on the number of visitors your site attracts, a shaving of markup here, and a corner cut there—all the while ensuring that your page validates as HTML 4.01—may lead you to choose the slightly more svelte HTML rather than XHTML. However, for most site owners the savings will be negligible.
Serving the Correct MIME Type
If you intend to create a web page that can be treated as XML and parsed accordingly, you’d probably create that page in XHTML. You may also want to take this course of action for the purposes of including another XML-based technology such as MathML or CML (Chemical Markup Language) in your page. If you do find yourself needing to use those technologies, you’re almost certainly not a “typical” web page author, and as such, most of what follows won’t really concern you too much …
In order for your page to be interpreted as proper XML, you
must serve it with a MIME type of
"application/xhtml+xml" (normally, web pages are served
"text/html"). Once you do so, you’ll have to be
very careful with the coding of your web page. One validation
slip-up—for example, an unquoted attribute value, a non-symmetrical
opening and closing of tags, or an unclosed tag—and your web page won’t
render at all. Users will be presented with a fatal server error of some
kind, which will tell them that the page couldn’t be parsed or understood.
It’s very unforgiving!
Here’s a simple test that you can try for yourself. Create a simple HTML page using the markup below:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Page title</title> </head> <body> <h1>Hello world</h1> <p>This is a web page.</p> </body> </html>
Now save this document with the file extension
.xhtml, rather than
.html. Next, open the file in Firefox, Opera, or Safari.
Is everything looking okay? Now, make a subtle change: amend the closing
</h1>—and only the closing
use an uppercase H. Refresh the page and see what happens. If everything’s
gone to plan, you’ll now be looking at a broken web page, similar to the
one shown in Figure 1.
What’s important about this exercise is that the behavior
displayed by these browsers when they open a local file that’s not well
formed, and has the
.xhtml extension, is exactly
the same as the error that they’d present if they encountered a malformed
page on the web that was served with the MIME type
"application/xhtml+xml". Bear in mind that even if you
take the utmost care with your own code, it only takes one poorly formed
user comment to do the necessary damage! I’m sure you can see what a
tricky problem this can be!
The argument against using XHTML is
basically this: if you’re not using XHTML for the purposes of creating an
XML-based web application of some kind, there’s no real reason to use
XHTML—you may as well stick to HTML. And if you’re intent on creating a
web page that validates as XHTML, but it is served with the
"text/html" MIME type, you won’t really reap any kind
of benefit either. So if you want to use XHTML, learn it properly and be
sure to understand the pitfalls. Otherwise, you may be better off with
XHTML: Encouraging Good Practices?
I advise people to learn XHTML, not HTML, regardless
of whether the web page is going to be treated as an XML web application
of some kind or as a simple web page (for more on this thorny topic, see
the section entitled Serving the Correct MIME Type). By taking this
approach, you’re encouraged to nest elements properly, close all your tags
properly, and use quotes around all your attributes. This is my
preference, but I’m under no illusion as to the fact that if I serve one
of these web pages as
"application/xhtml+xml" and it
contains even a slight error, all my good work will end with the fatal
error mentioned above. That said, should I later wish to incorporate XML
features into my pages, I will have a good starting point to work
Given that this is a reference, rather than a guide aimed at total beginners, you likely already know a certain amount about HTML and XHTML; you may feel more comfortable taking the same approach, and using the XHTML syntax. If you’re a beginner, however, you may prefer to start with HTML 4.01, but you should still follow the rules for that version of HTML.
1 This is a topic that I’ve had to address, as my beginners’ book on HTML and CSS, Build Your Own Web Site The Right Way Using HTML & CSS, used XHTML, while the SitePoint Forums members argued about which flavor of HTML should be used. This argument prompted more than a few people who bought my book as complete beginners to ask me directly, “Why do you recommend using XHTML while some people say not to use it?”
- Tue, 18 Mar 2008 02:35:53 GMT
When deciding whether to use HTML or XHTML, I believe the most important factor is stated by the page's author is:
"The argument against using XHTML is basically this: if you are not using XHTML for the purposes of creating an XML-based web app of some kind, there is no real reason for using XHTML, you may as well stick to HTML. And if you are intent on creating a web page that validates as XHTML but it is served with the MIME-type of "text/html", then you’re not really reaping any kind of benefit either."
It'd be great to get away from the half-baked structure of HTML and go to a pure XML solution, but until MIME type "application/xhtml+xml" works on the vast majority of browsers that's not gonna happen for most websites. Meanwhile, pretending that your XHTML pages will render the same in "application/xhtml+xml" as they do in "text/html" isn't useful. For this reason I've switched back from XHTML to HTML, but I use XHTML coding rules when they don't violate HTML rules: such as lower case tags, using quotes around attribute values, closing all tags where possible (but no " />"), etc.
Also, and by experience, CSS doesn't always give the expected results when you don't properly close paired tags such as <p> and </p>. Another reason to follow many of the XHTML rules when coding HTML.
- Thu, 13 Mar 2008 09:49:41 GMT
To say that "<input type=submit name=cmdGo /> would be valid in HTML 4.01" can be misleading. Yes, it is valid but it doesn't mean the same thing as it would in XHMTL. In HTML 4.01 that code is equivalent to <input type=submit name=cmdGo>> (an INPUT element followed by a greater-than character).
Attribute minimisation in SGML and HTML actually means it's the attribute *name* that's omitted. When you use <input checked> instead of <input checked=checked>, it's actually "checked=" that's omitted, not "=checked".
Ampersands must be escaped as a character entity reference or numeric character reference even in HTML (except in element types whose content type is CDATA, like SCRIPT and STYLE). The difference is that HTML UAs have error handling that usually allows sloppy coding, while XML UAs are required to abort at well-formedness errors.
XHTML 1.0 allows both 'lang' and 'xml:lang' attributes, and Appendix C says you should use both when serving the document as text/html. XHTML 1.1, on the other hand, deprecates the 'lang' attribute.
The HTML syntax rules are no less strict or stringent than those of XHTML. Valid HTML can be parsed unambiguously; the optional tags are not really needed. XHTML's syntax rules are more *consistent* and *simple*, but I think it's unfair to describe HTML syntax as "loose". You could just as well call it "efficient" and say XHTML syntax is "bloated".
It should also be clearly stated that serving XHTML as text/html means you are not using XHTML *at all*. You are using HTML with syntax errors, as far as user agents are concerned. In other words, you are *relying* on browser bugs and error handling.
Too many beginners believe they are actually using XHTML, believing the hype about how "modern" and "strict" it is, yet their pages would fail miserably if they were served as XHTML.
XHTML is not "HTML that looks like XML". It's XML with a predefined set of element types, and serving it as anything other than an application of XML renders it completely pointless from a technical point of view.