HTML and XHTML Syntax

Writing valid HTML (or XHTML) is not a terribly difficult task once you know what the rules are, although the rules are slightly more stringent in XHTML than in HTML. The list below provides a quick reference to the rules that will ensure your markup is well-formed and valid. Note that there are other differences between HTML and XHTML which go beyond simple syntax requirements; those differences are covered in HTML Versus XHTML.

The Document Tree

A web page is, at its heart, little more than a collection of HTML elements—the defining structures that signify a paragraph, a table, a table cell, a quote, and so on. The element is created by writing an opening tag, and completed by writing a closing tag. In the case of a paragraph, you’d create a p element by typing <p>Content goes here</p>.

The elements in a web page are contained in a tree structure in which html is the root element that splits into the head and body elements (as explained in Basic Structure of a Web Page). An element may contain other nested elements (although this very much depends on what the parent element is; for example, a p element can contain span, em, or strong elements, among others). Where this occurs, the opening and closing tags must be symmetrical. If an opening paragraph tag is followed by the opening em element, the closing tags must appear in the reverse order, like so: <p>Content goes here, <em>and some of it needs emphasis</em> too</p>. If you were to type <p>Content goes here, <em>and some of it needs emphasis too</p></em>, you’d have created invalid markup.

Case Sensitivity

In HTML, tag names are case insensitive, but in XHTML they’re case sensitive. As such, in HTML, you can write the markup in lowercase, mixed case, or uppercase letters. So <p>this is a paragraph</p>, as is <P>this example</P>, and even <P>this markup would be valid</p>. In XHTML, however, you must use lowercase for markup: <p>This is a valid paragraph in XHTML</p>.

Opening and Closing Tags

In HTML, it’s possible to omit some closing tags (check each element’s reference to see whether an HTML closing tag is required), so this is valid markup: <p>This is my first paragraph.<p>This is my second paragraph.<p>And here’s the last one..

In XHTML, all elements must be closed. Hence the paragraph example above would need to be changed to: <p>This is my first paragraph.</p><p>This is my second paragraph.</p><p>And here’s the last one.</p>

As well as letting you omit some closing tags, HTML allows you to omit start tags—but only on the html, head, body, and tbody elements. This is not a recommended practice, but is technically possible.

For empty elements such as img, XHTML (that is not served with the application/xhtml+xml) requires us to use the XML empty element syntax: <elementname attribute="attributevalue"/>

If serving the document as application/xhtml+xml, it’s also valid to close empty elements using a start and end tag, for example the img element, as <img></img>

Readability Considerations

A browser doesn’t care whether you use a single space to separate attributes, ten spaces, or even complete line breaks; it doesn’t matter, as long as some space is present. As such, all of the examples below are perfectly acceptable (although the more spaces you include, the larger your web page’s file size will be—each occurrence of whitespace takes up additional bytes—so the first example is still the most preferable):

<img src="/images/burj.jpg" alt="Burj Al Arab, iconic hotel in
Dubai" class="gallery"/>

     alt="Burj Al Arab, iconic hotel in Dubai"



     alt="Burj Al Arab, iconic hotel in Dubai"


In XHTML all attribute values must be quoted, so you’ll need to write class="gallery" rather than class=gallery. It’s valid to omit the quotes from your HTML, though it may make reading the markup more difficult for developers revisiting old markup (although this really depends on the developer—it’s a subjective thing). It’s simply easier always to add quotes, rather than to have to remember in which scenarios attribute values require quotes in HTML, as the following piece of HTML demonstrates:

<a href=""> needs to be quoted because it contains a /
<a href=index.html> acceptable without quotes in HTML

Another reason why it’s a good idea always to quote your attributes, even if you’re using HTML 4.01, is that your HTML editor may be able to provide syntax coloring that makes the code even easier to scan through. Without the quotes, the software may not be able to identify the difference between elements, attributes, and attribute values. This fact is illustrated in Figure 1, which shows a comparison between quoted and unquoted syntax coloring in the Mac text editor TextMate.

Figure 1. TextMate’s syntax coloring taking effect to display quoted attributes Syntax coloring in TextMate (second screenshot shows attributes quoted and syntax coloring taking effect)

Commenting Markup

You may add comments in your HTML, perhaps to make it clear where sections start or end, or to provide a note to remind yourself why you approached the creation of a page in a certain way. What you use comments for isn’t important, but the way that you craft a comment is important. The HTML comment looks like this: <!-- this is a comment -->. It’s derived from SGML, which starts with an <! and ends with an >; the actual comment is, in effect, inside the opening -- and the closing -- parts. These hyphens tell the browser when to start ignoring text content, and when to start paying attention again. The fact that the double hyphen -- characters signify the beginning and end of the comment means that you should not use double hyphens anywhere inside a comment, even if you believe that your usage of these characters conforms to SGML rules. Single hyphens are allowed, however.

The markup below shows examples of good and bad HTML comments—see the remark associated with each example for more information:

<p>Take the next right.<!-- Look out for the
    signpost for 'Castle' --></p> a valid comment

<p>Take the next right.<!-- Look out for -- Castle --></p>
not a valid comment; the double dashes in the middle could be
misinterpreted as the end of the comment

<p>Take the next right.<!-- Look out for -- -- Castle --></p>
a valid comment; 'Look out for' is one comment, 'Castle' is another

<p>Take the next right.
  This is just asking for trouble. Too
  many hyphens! --></p>
a valid comment; don't use hyphens or <> characters to format comment text

<p <!-- class="lively" -->>Wowzers!</p>
It's not possible to comment out attributes inside an HTML element

User-contributed notes

Related Products