Keith Cirkel

Software Cyber Shepherd

XHTML: Why everyone wants it but you shouldn't use it

Everyone is asking for it these days. Go onto freelance boards, talk to clients, even on job boards, clients & companies are requiring their developers to know XHTML. The problem is, XHTML isn't what everyone thinks it is, and that isn't going to change any time soon.

Why Everyone Wants XHTML

Back in the dark days of the web, developers were using HTML, and a lot were using it badly. HTML, like many languages, allows the user to make a real mess of it -- and as W3Schools.com quite happily states: "Many pages on the internet contain "bad" HTML."

Bad HTML

Technically, the HTML standard was a mess; tags didn't have to close, there was a whole range of useless tags which vacillate in and out of popularity, and lots of down right annoying tags. Things have been slowly fixed, and HTML 4.01 is a lot cleaner than its predecessors, but "back in the day" (circa 1995-7) no one could say HTML was a good language. Couple this with the plethora of horrible WYSIWYG HTML editors, the likes of Microsofts FrontPage (shudder) and DreamWeaver which frankly butcher any semblance of sensible mark-up, and you have a real recipe for disaster.

Birth of XHTML

HTML also uses a loose form of the SGML standard (because XML didn't exist in 1995 when HTML was created), which allowed all of this mess to happen. Around 1997 the W3C realised the failings of HTML, and in part SGML and began working on a beautiful subset of SGML called XML. XML is the simpler, more restricted (in a good way) offspring of SGML that was really needed to truly standardise machine readable documentation. XML was, for all intents and purposes, the saviour of open documentation. So, with this cool new tool, W3C set to work on creating a subset of HTML using the XML standard. XHTML was born from this. XHTML took away pretty much all the crap that lurked in HTML. All those annoying and useless tags were deprecated, tags had to close properly, and because of its use of the much simpler XML format, it was much easier to maintain, implement and extend. So, back in 2000 when XHTML became an official standard, people rejoiced, we had a strict, properly typed, decent implementation of HTML. Thanks to its simplicity and strict rules, it could be rendered much faster compared to (bad) HTML, and it was easier to read thanks to the enforcement of lower case tags and proper closing. Who wouldn't want to use it? Popularity rose for XHTML and developers quickly learnt the new standard. I admit I was one of the proponents too. XHTML was the buzzword of the 2000s, but it didn't turn out to be the panacea people had hyped it up to be…

HTML 4.01

The popularity of this new language meant a lot of the "cutting edge" developers didn't even notice the release of HTML 4.01 - a release which mirrored the decisions of XHTML, minus the, y'know, XML part. In fact XHML 1.0 and HTML 4.01 had an identical tag set, and could have pretty much identical looking code, minus a few slashes. Infact, HTML 4.01 was released as a standard a month earlier than XHTML. My feeling is that because these were released so close together, a big part of the perceived improvement of XHTML made was actually the improvement made from HTML in general.

Why you shouldn't use it

So by now we realise XHTML is pretty much HTML 4.01 but uses XML, which is faster, and simpler, and better. What gives? Why shouldn't you use XHTML then?

XHTML as HTML

Here's the kicker - XHTML kinda doesn't work all that well. See, with all the great work W3C does to bring the web forward, the biggest browser vendors around at the time are the true judge of how far forward we go. Want proof, just look at the current HTML5 Video debacle. In 2000 when Internet Explorer had 80-90% of the market, it was the ultimate decider on these matters. For whatever reasons Microsoft had, it released Internet Explorer 6 in 2001, without XHTML support. To rephrase that for emphasis, Internet Explorer 6 does not support XHTML. Also, in fact, the latest version of Internet Explorer to support XHTML is 9, which isn't released yet. Awesome. In the extreme wisdom of web designers all over (including me, sarcasm btw), XHTML documents would be passed off as "text/html", so they could be rendered in Internet Explorer. This means all browsers see these documents as HTML, not XHTML. This is important because anything you expect to happen in XHTML that doesn't in HTML, wont happen. Anything you expect not to happen in XHTML that does happen in HTML, will, of course, happen. So to put it blankly - virtually no body uses XHTML, lots of people incorrectly use XHTML as HTML. Doing this is a really bad idea.

HTML is better

Thanks to the mess of HTML prior to 4.01, and the mess of XHTML as HTML, all the browsers today have a "quirks mode", which is a polite way of saying a way to deal with crap code. Quirks Mode tries to overcome the inadequacies of the HTML document it is rendering, which includes typos, missing tags, missing attributes for tags, added slashes (think
) and lots of other idiosyncrasies. XHTML doesn't do this - because of its strict XML definitions, an invalid XHTML document won't render (see also W3C). This is a double edged sword - it's good because it is programatically correct - as the xhtml.com article states, you wouldn't expect a C++ or PHP application to run on bad code, so why XHTML? The problem lies in that it can be excruciatingly difficult to create valid XML code. Small pages are easy, but imagine a blog - where you have to ensure every post on that blog uses valid XHTML, and every comment inside every post does the same. Suddenly you have turned your life into a never-ending user support nightmare, having to validate (or atleast check) every new post or page that gets a new comment.

Don't use it!

So, history lessons aside, why shouldn't you use XHTML? Lets break it down:

  • XHTML is not supported by Internet Explorer. IE9 is the first release to support it. Cutting your userbase by 90% is usually not a good idea.
  • XHTML rendered as HTML can introduce annoying problems & incompatibilities.
  • When you do actually use XHTML properly, it can break very easily. This won't change either - it is meant to by design.
  • Proper XHTML becomes difficult to maintain when users introduce new code all the time. Sure there is lots you can put in place to prevent invalid code being introduced - but it all adds up
  • Bad Javascript will break XHTML documents. For example, any code with document.write() in it which includes Google Adsense by the way!

Not very compelling now is it? But what about all the benefits of XHTML? Well, for the most part, the main benefits of XHTML carry over perfectly into HTML. I recommend you take the lessons XHTML provides, and put them to good use in HTML:

  • Always use lower case tag & attribute names. It just makes it so much more readable.
  • Quote your attribute values. It will help a lot when using Syntax-Highlighting in your development environment to do so.
  • Close your tags. You should do this anyway to ensure your HTML is given the best chance of rendering in all browsers.
  • Properly doctype your HTML. This plus closing your tags allows the browser to render your code in "No Quirks" mode, lowering the chance of browser incompatibilities.
  • Spend time to validate your documents. Using the W3C validator service to validate your documents is a good way to make the best of HTML

Still want XHTML?

If you're still adamant about using XHTML, then atleast use it right. Use server-side user agent sniffing or even better; content negotiation to detect if you are serving to Internet Explorer (or other non-xml-able browsers), and if so serve an HTML document, otherwise serve XHTML with the "application/xml" or "application/xhtml+xml" mime tag. Remember, if you code proper HTML, and code it as well as you would XHTML, all you're missing out on is your
and tags being
and , but what you gain is flexibility and graceful error handling.

Thoughts? Chat to me, @keithamus on Twitter