HTML vs XHTML: Comparing Two Parsing Modes

Share this article

HTML5 has two parsing modes or syntaxes: HTML and XML. The difference depends on whether the document is served with a Content-type: text/html header or a Content-type: application/xml+xhtml header. If it’s served as text/html, the following rules apply:

  • Start tags are not required for every element.
  • End tags are not required for every element.
  • Only void elements such as br, img, and link may be “self-closed” with />.
  • Tags and attributes are case-insensitive.
  • Attributes do not need to be quoted.
  • Some attributes may be empty (such as checked and disabled).
  • Special characters, or entities, do not have to be escaped.
  • The document must include an HTML5 DOCTYPE.

HTML Syntax

Let’s look at another HTML5 document.
<!DOCTYPE html>
  <html>
    <head>
      <meta charset=utf-8>
      <title>Hi</title>
      <!-- 
      This is an example of a comment.
      The lines below show how to include CSS 
      -->
      <link rel=stylesheet href=style.css type=text/css>
      <style>
        body{
          background: aliceblue;
        }
      <style>
    </head>
    <body>
     <p>
        <img src=flower.jpg alt=Flower>
        Isn't this a lovely flower?
     
      <p>
        Yes, that is a lovely flower. What kind is it?
       
      <script src=foo.js></script>
    </body>
</html>
Again, our first line is a DOCTYPE declaration. As with all HTML5 tags, it’s case-insensitive. If you don’t like reaching for Shift, you could type < !doctype html> instead. If you really enjoy using Caps Lock, you could also type < !DOCTYPE HTML> instead. Next is the head element. The head element typically contains information about the document, such as its title or character set. In this example, our head element contains a meta element that defines the character set for this document. Including a character set is optional, but you should always set one and it’s recommended that you use UTF-8. Our head element also contains our document title (Hi). In most browsers, the text between the title tags is displayed at the top of the browser window or tab. Comments in HTML are bits of text that aren’t rendered in the browser. They’re only viewable in the source code, and are typically used to leave notes to yourself or a coworker about the document. Some software programs that generate HTML code may also include comments. Comments may appear just about anywhere in an HTML document. Each one must start with . A document head may also contain link elements that point to external resources, as shown here. Resources may include style sheets, favicon images, or RSS feeds. We use the rel attribute to describe the relationship between our document and the one we’re linking to. In this case, we’re linking to a cascading style sheet, or CSS file. CSS is the stylesheet language that we use to describe the way a document looks rather than its structure. We can also use a style
element (delineated here by and ) to include CSS in our file. Using a link element, however, lets us share the same style sheet file across multiple pages. By the way, both meta and link, are examples of void HTML elements; we could also self-close them using />. For example, would become , but it isn’t necessary to do this.

“XHTML5”: HTML5’s XML Syntax

HTML5 can also be written using a stricter, XML-like syntax. You may remember from Chapter 1 that XHTML 1.0 was “a reformulation of HTML 4 as an XML 1.0 application.” That isn’t quite true of what is sometimes called “XHTML5”. XHTML5 is best understood as HTML5 that’s written and parsed using the syntax rules of XML and served with a Content-type: application/xml+xhtml response header. The following rules apply to “XHTML5”:
  • All elements must have a start tag.
  • Non-void elements with a start tag must have an end tag (p and li, for example).
  • Any element may be “self-closed” using />.
  • Tags and attributes are case sensitive, typically lowercase.
  • Attribute values must be enclosed in quotes.
  • Empty attributes are forbidden (checked must instead be checked="checked" or checked="true").
  • Special characters must be escaped using character entities.
Our html start tag also needs an xmlns (XML name space) attribute. If we rewrite our document from above to use XML syntax, it would look like the example below.
<!DOCTYPE html>
  <html xmlns="https://www.w3.org/1999/xhtml">
    <head>
      <meta charset="utf-8" />
      <title>Hi</title>
    </head>
    <body>
      <p>
        <img src="flower.jpg" alt="Flower" />
        Isn't this a lovely flower?
      </p>
      <script src="foo.js" />
    </body>
</html>
Here we’ve added the XML name space with the xmlns attribute, to let the browser know that we’re using the stricter syntax. We’ve also self-closed the tags for our empty or void elements, meta
and img. According to the rules of XML and XHTML, all elements must be closed either with an end tag or by self-closing with a space, slash, and a right-pointing angle bracket (/>). In this example, we have also self-closed our script tag. We could also have used a normal tag, as we’ve done with our other elements. The script element is a little bit of an oddball. You can embed scripting within your documents by placing it between script start and end tags. When you do this, you must include an end tag. However, you can also link to an external script file using a script tag and the src attribute. If you do so, and serve your pages as text/html, you must use a closing tag. If you serve your pages as application/xml+xhtml, you may also use the self-closing syntax. Don’t forget: in order for the browser to parse this document according to XML/XHTML rules, our document must be sent from the server with a Content-type: application/xml+xhtml response header. In fact, including this header will trigger XHTML5 parsing in conforming browsers even if the DOCTYPE is missing. As you may have realized, XML parsing rules are more persnickety. It’s much easier to use the text/html MIME type and its looser HTML syntax.

Frequently Asked Questions (FAQs) about HTML and XHTML

What are the key differences between HTML and XHTML?

HTML (HyperText Markup Language) and XHTML (eXtensible HyperText Markup Language) are both markup languages used to create web pages. However, they have some key differences. HTML is a flexible language that allows for a certain degree of error, while XHTML is stricter and requires all elements to be properly closed and nested. XHTML also requires all attribute values to be quoted and all tags to be in lowercase. This makes XHTML more predictable and easier to debug, but it also means that it requires more attention to detail.

Why would I choose XHTML over HTML?

XHTML has several advantages over HTML. It is more predictable and easier to debug, which can save developers a lot of time and frustration. It also supports namespaces, which allows for the integration of other XML-based languages. This makes XHTML a more versatile and powerful language. However, it also requires more attention to detail and is less forgiving of errors.

Can I convert an HTML document to XHTML?

Yes, you can convert an HTML document to XHTML. This process is known as “cleaning” the HTML. It involves ensuring that all elements are properly closed and nested, all attribute values are quoted, and all tags are in lowercase. There are also tools available online that can automate this process.

Is XHTML compatible with all web browsers?

XHTML is compatible with all modern web browsers. However, it may not be fully supported by older browsers. If you need to support older browsers, you may want to stick with HTML.

What is the future of XHTML?

XHTML 2.0 was proposed as a successor to XHTML 1.0, but it was eventually abandoned in favor of HTML5. However, XHTML5, a version of XHTML that incorporates features of HTML5, is currently being developed. It is expected to be more powerful and versatile than its predecessors.

What are the benefits of using XHTML?

XHTML offers several benefits. It is more predictable and easier to debug than HTML, which can save developers a lot of time and frustration. It also supports namespaces, which allows for the integration of other XML-based languages. This makes XHTML a more versatile and powerful language.

Is XHTML harder to learn than HTML?

XHTML is not necessarily harder to learn than HTML, but it does require more attention to detail. Because XHTML is stricter than HTML, it requires all elements to be properly closed and nested, all attribute values to be quoted, and all tags to be in lowercase. However, once you get the hang of these rules, XHTML can be just as easy to use as HTML.

Can I use JavaScript with XHTML?

Yes, you can use JavaScript with XHTML. However, because XHTML is an XML-based language, it requires the use of CDATA sections to encapsulate the JavaScript code. This can make the code a bit more complex, but it is still perfectly doable.

What is the DOCTYPE declaration in XHTML?

The DOCTYPE declaration in XHTML is used to specify the version of XHTML that the document is using. This helps the browser to correctly interpret and render the document. The DOCTYPE declaration should be the first line of any XHTML document.

Can I use CSS with XHTML?

Yes, you can use CSS with XHTML. In fact, because XHTML is stricter and more predictable than HTML, it can often be easier to style with CSS. However, as with JavaScript, you need to be aware of the rules of XHTML when writing your CSS code.

Adam RobertsAdam Roberts
View Author

Adam is SitePoint's head of newsletters, who mainly writes Versioning, a daily newsletter covering everything new and interesting in the world of web development. He has a beard and will talk to you about beer and Star Wars, if you let him.

Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week