Doctypes

The doctype declaration, which should be the first item to appear in the source markup of any web page, is an instruction to the web browser (or other user agent) that identifies the version of the markup language in which the page is written. It refers to a known Document Type Definition, or DTD for short. The DTD sets out the rules and grammar for that flavor of markup, enabling the browser to render the content accordingly.

The doctype contains a lot of information, none of which you will be likely to find yourself being tested on in a job interview, so don’t worry if it all seems too difficult to remember. Besides, most web authoring packages will insert a syntactically correct doctype for you anyway, so there’s little chance of you getting it wrong.

The doctype begins with the string <!DOCTYPE, which should be written in uppercase:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The next part, which reads html (for XHTML) or HTML, refers to the name of the root element for the document. This information is included for validation purposes, since the DTD itself doesn’t say which element is the root element in the document tree:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The PUBLIC statement informs the browser that the DTD is a publicly available resource. If you had decided that the various flavors of HTML or XHTML were lacking in some way, and you wanted to extend the language beyond the defined specifications, you could go to the effort of creating a custom DTD. This would allow you to define custom elements, and would enable your documents to validate according to that DTD; in this case, you’d change the PUBLIC value to SYSTEM. That said, I’ve never actually seen an author do this—most people live within the limitations of the defined HTML/XHTML specifications (or plug the gaps using Microformats). Here’s the PUBLIC statement:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The next section is known as the Public Identifier, and provides information about the owner or guardian of the DTD—in this case, the W3C. The Public Identifier, which is shown here, is not case sensitive: it also includes the level of the language that the DTD refers to (XHTML 1.0), and identifies the language of the DTD—not the content of the web page, it’s important to note. This language is defined as English, or EN for short. Authors should not change this EN reference, regardless of the language contained in the web page.

Note that if the doctype contains the keyword SYSTEM, the Public Identifier section is omitted.

All of this information is highlighted in the short fragment below:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Together, these two parts constitute the Formal Public Identifier (or FPI).

Finally, the doctype includes a URL, known as the Formal System Identifier (FSI), which refers to the location of the DTD. If you want to really geek out, you can copy and paste the address into your web browser’s location bar and download a copy of the DTD, but be warned that it doesn’t make for light reading! Here’s the FSI:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Table 1 shows the doctypes available in the WC3 recommendations.

Table 1. Available Doctypes
Doctype Description
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> HTML 4.01 Strict allows the inclusion of structural and semantic markup, but not presentational or deprecated elements such as font. framesets are not allowed.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> HTML 4.01 Transitional allows the use of structural and semantic markup as well as presentational elements (such as font), which are deprecated in Strict. framesets are not allowed. Authors should use the Strict DTD when possible, but may use the Transitional DTD when support for presentation attributes and elements is required.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> HTML 4.01 Frameset applies the same rules as HTML 4.01 Transitional, but allows the use of frameset content.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> XHTML 1.0 Strict, like HTML4.01 Strict, allows the use of structural and semantic markup, but not presentational or deprecated elements (such as font); framesets are not allowed. Unlike HTML 4.01, the markup must be written as well-formed XML.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> XHTML 1.0 Transitional, like HTML4.01 Transitional, allows the use of structural and semantic markup as well as presentational elements (such as font), which are deprecated in Strict; framesets are not allowed. Unlike HTML 4.01, the markup must be written as well-formed XML. Authors should use the Strict DTD when possible, but may use the Transitional DTD when support for presentation attributes and elements is required.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> XHTML 1.0 Frameset applies the same rules as XHTML 1.0 Transitional, but also allows the use of frameset content.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> XHTML 1.1 is a reformulation of XHTML 1.0 Strict, thus most of the same rules apply. However, 1.1 allows for modularization, which means that you can add modules (for example, to provide Ruby support for Chinese, Japanese, and Korean characters).
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> HTML 3.2 is an archaic doctype that’s no longer recommended for use (it’s included here for information only).
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 3.0//EN"> HTML 3.0 is an archaic doctype that’s no longer recommended for use (it’s included here for information only).
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Level 2//EN"> HTML 2.0 is an archaic doctype that’s no longer recommended for use (it’s included here for information only). Note that there are actually 12 variants of this old doctype, all of which can be found Section 9.6 of RFC1866.

Doctype Switching or Sniffing

The way in which a web browser renders a page’s content is often affected by the doctype that’s defined. Browsers use various modes to determine how to render a web page:
  • Quirks Mode

    In this mode, browsers violate normal web formatting specifications as a way to avoid the poor rendering (or “breaking,” to use the vernacular) of pages that have been written using practices that were commonplace in the late 1990s. The quirks differ from browser to browser. In Internet Explorer 6 and 7, the Quirks Mode displays the document as if it were being viewed in IE version 5.5. In other browsers, Quirks Mode contains a selection of deviations that are taken from Almost Standards mode (explained below).

  • Standards Mode

    In this mode, browsers attempt to give conforming documents an exact treatment according to the specification (but this is still dependent on the extent to which the standards are implemented in a given browser).

  • Almost Standards Mode

    Firefox, Safari, and Opera (version 7.5 and above) add a third mode, which is known as Almost Standards Mode. This mode implements the vertical sizing of table cells in a traditional fashion—not rigorously, as defined in the CSS2 specification. (Internet Explorer versions 6 and 7 don’t need an Almost Standards Mode, because they don’t implement the vertical sizing of table cells rigorously, according to the CSS2 specification, in their respective Standards Modes).

Depending on the doctype that’s defined, and the level of detail contained inside the doctype (for example, whether it does or doesn’t include a Public Identifier), different browsers trigger different modes from the list above. Doctype switching or sniffing refers to the task of swapping one doctype for another, or changing the level of detail in the doctype, in order to coax a browser to render in one of Quirks, Standards, or Almost Standards Modes.

An example of a situation in which doctype sniffing was put to use most frequently was to address rendering differences between Internet Explorer 6 and earlier versions of the browser, which calculated content widths differently when widths, padding, borders, and margins were applied in CSS. (This topic is not something we’ll cover in this HTML reference, but you can find out more in The Ultimate CSS reference.) In Internet Explorer 6, depending on the doctype defined, one of two different rendering modes would be used to calculate these widths.

As an example, imagine that you specify the doctype as HTML 4.01 Strict, like so:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

In IE6, the doctype above will cause the browser to render in Standards Mode, which includes using the W3C method for box model calculations. However, you see an entirely different result if you use the following doctype:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

In this scenario, IE6 will use its old, incorrect, non-W3C method for making box model calculations (also known as the ‘broken box model’).

Note that this is not the only difference between Quirks and Standards Mode—it’s just one example of the differences between the two modes (but one that caused a great deal of problems because of the disastrous effect it could have on page layout).

For a complete reference of how different browsers behave when different doctypes are provided, refer to the chart at the foot of Henri Sivonen’s article “Activating Browser Modes with Doctype.”

User-contributed notes

Related Products