While they are referred to in different ways by WHATWG, the development of HTML5 could be said to have three principles that distinguish the language from XHTML: pragmatism, simplification, and looseness.
The XHTML 1.0 Strict template for a basic page can be intimidating to look at, even overwhelming. By comparison, the doctype declaration for an HTML5 page is so simple you can type it out by hand:
“That’s it?” you might be saying. Yes, that’s it.
- “This can’t possibly validate, can it?”
It really does. Go ahead and use the full HTML5 template below in a page if you don’t believe me: validate it as you would any other page.
- “How does the validator know which version of HTML I’m using?”
HTML really isn’t about the doctype, which is optional. HTML is about tags being used in an internally consistent method in the document. If you start to mix HTML5 tags with obsolete elements, or in the wrong order, that makes the page invalid. HTML5 extends the HTML language, it doesn’t replace it, so all the tags you have learned in XHTML are still perfectly valid, and can be used in an HTML5 page, just as whatever comes down the pike after HTML5 will have to support the majority of HTML5 markup… so specifying which version of HTML you are using is somewhat redundant.
The meta tag to set UTF-8 encoding for the page has also been simplified; note that DreamWeaver CS5 does this wrong by default:
The code to link a style sheet has been simplified:
<link rel="stylesheet" href="styles.css">
The entire HTML5 page template is therefore:
<title>An HTML5 template</title>
Technically, you could use HTML5 shortcuts to reduce this even further:
<title>An HTML5 template</title>
XHTML – most especially the version of XHTML I have been teaching, XHTML 1.0 Strict – is very particular about the way in which it is written: all code is in lowercase, tags are always closed, etc. To me, this is a good thing: clear, concise rules are the hallmark of good governance and good code. However, learning rules is arduous, and making small mistakes in code can lead to big headaches in validation.
HTML5 frees up the rules: tags can be written uppercase, lowercase, or mixed. Most table elements, including
<th>, do not need to be closed. Neither do
<html>. (This is an anathema to many developers, and I would strongly recommend always closing your elements, unless there is a very strong argument against doing so, the best reason being that you are trying to reduce the size of files to the very least possible: in which case you should also be using a minified and removing all carriage returns from your code, and absolutely determining that your images are optimized until they squeak). In addition, some of these exceptions are tricky: you can avoid closing a paragraph if it is followed by one of 24 other elements, but not if the following tag is any one of over 100 others. Putting attribute values in quotes is optional.
id values can start with numerals.
Form elements no longer have the requirement that they be wrapped by
<fieldset> or even the
It is not a good idea – the example above lacks an
I still strongly recommend that you do almost everything I have taught as good coding practice thus far – everything in lowercase, all tags closed, all attribute values inside double quotes, use of the
<form> tag where appropriate – as it makes it far easier to move to languages that are less strict, and makes your code easier to both read and debug. But I can no longer say it is wrong to do otherwise.
A few elements were just a bad idea to begin with: frames and everything to do with them are obsolete in the HTML5 spec. Good riddance.
<iframe>, however, is supported in HTML5. I will talk more about iframes later.
Somewhat controversially, the
<acronym> tag has been dropped. Now that every commercial browser supports the abbreviation tag (
<acronym> tag is somewhat redundant: the distinction between acronyms (words made of letters that form a pronounceable word in themselves, such as SAIT and laser) and abbreviations (the general principle of making new words from the leading letters of joined words) is lost on most people, and every acronym is ultimately an abbreviation anyway.
<object> tag is pretty much redundant now in most instances (it has largely been superseded by the
<video> tags), but is still supported.
<em> remain, but
<i> make a comeback in HTML5. To me, the distinction WHATWG makes between the elements is so fine as to be hair-thin: I would recommend that you continue to use
<em> as I have taught you.
<big> is out, but
<small> remains: it is now relegated to the markup for small print, as used on a legal document or warranty.
In HTML5 you can now enclose multiple elements with a link:
<img src="dudley-storey.jpg" alt="">
<p>Learn more about Dudley Storey</p>
Technically, this move would be illegal under XHTML: both the paragraph content and the image should have been separate links. In the real world most browsers supported the code, so HTML5 supports it too, and makes it official.