HTML and XHTML Syntax
Writing valid HTML (or XHTML) is not a terribly difficult task once you know what the rules are, although the rules are slightly more stringent in XHTML than in HTML. The list below provides a quick reference to the rules that will ensure your markup is well-formed and valid. Note that there are other differences between HTML and XHTML which go beyond simple syntax requirements; those differences are covered in HTML Versus XHTML.
The Document Tree
A
web page is, at its heart, little more than a collection of HTML
elements—the defining structures that signify a paragraph, a table, a
table cell, a quote, and so on. The element is created by writing an
opening tag, and completed by writing a closing tag. In the case of a
paragraph, you’d create a p element by typing
<p>Content goes here</p>.
The elements in a web page are contained in a tree structure in
which html is the root element that splits into the
head and body elements (as explained
in Basic Structure of a Web Page). An element
may contain other nested elements (although this very much
depends on what the parent element is; for example, a p
element can contain span, em, or
strong elements, among others). Where this occurs, the
opening and closing tags must be symmetrical. If an opening paragraph tag
is followed by the opening em element, the closing tags
must appear in the reverse order, like so: <p>Content goes
here, <em>and some of it needs emphasis</em>
too</p>. If you were to type <p>Content goes
here, <em>and some of it needs emphasis
too</p></em>, you’d have created invalid
markup.
Case Sensitivity
In HTML, tag
names are case insensitive, but in XHTML they’re case sensitive.
As such, in HTML, you can write the markup in lowercase, mixed case, or
uppercase letters. So <p>this is a
paragraph</p>, as is
<P>this example</P>, and
even <P>this markup would be
valid</p>. In XHTML, however, you must
use lowercase for markup: <p>This is a valid paragraph in
XHTML</p>.
Opening and Closing Tags
In HTML, it’s possible to omit some closing tags
(check each element’s reference to see whether an HTML closing tag is
required), so this is valid markup: <p>This is my first
paragraph.<p>This is my second paragraph.<p>And here’s the
last one..
In XHTML, all elements must be closed.
Hence the paragraph example above would need to be changed to:
<p>This is my first paragraph.</p><p>This is my
second paragraph.</p><p>And here’s the last
one.</p>
As well as letting you omit some
closing tags, HTML allows you to omit start tags—but only on the
html, head, body,
and tbody elements. This is not a recommended practice,
but is technically possible.
For empty elements such
as img, XHTML requires us to use
the XML empty element syntax: <elementname
attribute="attributevalue"/>.
Readability Considerations
A browser doesn’t care whether you use a single space to separate attributes, ten spaces, or even complete line breaks; it doesn’t matter, as long as some space is present. As such, all of the examples below are perfectly acceptable (although the more spaces you include, the larger your web page’s file size will be—each occurrence of whitespace takes up additional bytes—so the first example is still the most preferable):
<img src="/images/burj.jpg" alt="Burj Al Arab, iconic hotel in
Dubai" class="gallery"/>
<img
src="/images/burj.jpg"
alt="Burj Al Arab, iconic hotel in Dubai"
class="gallery"
/>
<img
src="/images/burj.jpg"
alt="Burj Al Arab, iconic hotel in Dubai"
class="gallery"/>
In XHTML all attribute values must be
quoted, so you’ll need to write class="gallery" rather
than class=gallery. It’s valid to omit the quotes from
your HTML, though it may make reading the markup more difficult
for developers revisiting old markup (although this really depends on the
developer—it’s a subjective thing). It’s simply easier always to
add quotes, rather than to have to remember in which scenarios attribute
values require quotes in HTML, as the following piece of HTML
demonstrates:
<a href="http://example.org"> needs to be quoted because it contains a / <a href=index.html> acceptable without quotes in HTML
Another reason why it’s a good idea always to quote your attributes, even if you’re using HTML 4.01, is that your HTML editor may be able to provide syntax coloring that makes the code even easier to scan through. Without the quotes, the software may not be able to identify the difference between elements, attributes, and attribute values. This fact is illustrated in Figure 1, which shows a comparison between quoted and unquoted syntax coloring in the Mac text editor TextMate.
Commenting Markup
You
may add comments in your HTML, perhaps to make it clear where sections
start or end, or to provide a note to remind yourself why you approached
the creation of a page in a certain way. What you use comments for isn’t
important, but the way that you craft a comment is important. The
HTML comment looks like this: <!-- this is a comment
-->. It’s derived from SGML, which starts with an
<! and ends with an >; the actual
comment is, in effect, inside the opening -- and the
closing -- parts. These hyphens tell the browser when to
start ignoring text content, and when to start paying attention again. The
fact that the -- characters signify the beginning and end
of the comment means that you should not use them anywhere inside a
comment, even if you believe that your usage of these characters
conforms to SGML rules. Note that you can’t use hyphens inside XML
comments at all, which is an even stronger reason not to get into the
habit.
The markup below shows examples of good and bad HTML comments—see the remark associated with each example for more information:
<p>Take the next right.<!-- Look out for the
signpost for 'Castle' --></p> a valid comment
<p>Take the next right.<!-- Look out for -- Castle --></p>
not a valid comment; the double dashes in the middle could be
misinterpreted as the end of the comment
<p>Take the next right.<!-- Look out for -- -- Castle --></p>
a valid comment; 'Look out for' is one comment, 'Castle' is another
<p>Take the next right.
<!---------------------------------
This is just asking for trouble. Too
many hyphens! --></p>
a valid comment; don't use hyphens or <> characters to format comment text
<p <!-- class="lively" -->>Wowzers!</p>
It's not possible to comment out attributes inside an HTML element
User-contributed notes
- ID:
- #12
- Date:
- Wed, 26 Mar 2008 12:47:00 GMT
'For empty elements (such as the img), XHTML requires that the element uses the XML empty element syntax'
Not true. You can write <img...></img> if you like, provided you serve the document as an application of XML. The statement above only applies to Appendix C compatible pretend-XHTML.
Add a note
To post a note on this topic, please log in with your SitePoint username and password. If you don't have an account yet, you can create a new account for free.