HTTP headers are wild

As part of a project I'm working on, I was doing research into HTTP headers. I couldn't really find any decent resources that summarised the differences of headers between browsers, so decided to just write my own. Then things… got out of hand.

This is big, so before you get into this make yourself a brew, sit down and relax… I was considering splitting this into two articles, but then I quickly realised doing that is pointless and annoying, and you all have attention spans longer than 5 paragraphs.

What are HTTP headers?

Pretty much every protocol that you communicate with has "headers", which facilitate telling the server you're requesting information from what to do, where to get that information and how to get it. When your browser makes a request (usually visiting a web page) using the Hyper Text Transfer Protocol, it sends a bunch of info off to the server as part of the request. Mostly this is so that the server can interpret the request data and do different things with it. It sends the Request line which includes the Path and Method, to tell the server where you are querying. The Method part of this tells the server what to do with the resource, most of the time this is sent as GET which tells the server to just grab the contents and not change anything. However when submitting a form, the method may change to be a POST - telling the server that you're sending it a bunch of form fields. Then it sends a set of headers, such as Host, Date, Cookie and some more that we're going to look at - these tell the server some more information about the request, such as statistics, what type of request body the request has, and what type of content the request excpects back. If you want to know more about HTTP headers, Wikipedia has a great entry about every standard HTTP and what it does.

(Thanks to @m_strehl, teddyh and donavanm for setting me straight on this!)

The experiment

I wanted to see the differences between each web browser, and the headers they sent - I knew the Accepts header was different per browser, but I wanted to document this a bit better. So I set up an extremely simple web-page, which simply lists all of the headers that have been sent to it, in a table. I collated all of these and tried to spot the differences. For each browser I simply did a GET request to http://localhost:3000/, using BrowserStack's screenshot utility to grab them all at once. The code is pretty trivial, but also nasty:

The Browsers

Long story short, I tested on a shedload of browsers. Here is the low down of each brand/platform and the browsers (or mobile devices) I tested on:

Apple iOS 5	- iPad 2
Apple iOS 5.1	- iPhone 4S - iPad 3
Apple iOS 6	- iPhone 4S - iPhone 5 - iPad 3 - iPad Mini
Apple iOS 7	- iPhone 5S - iPad 2 - iPad 3rd Gen
Samsung	- Galaxy GS - Galaxy SII - Galaxy SIII - Galaxy Nexus - Galaxy Note - Galaxy Note II - Galaxy Note 10.1 - Galaxy Tab - Galaxy Tab II
Motorolla	- Razr - Droid 4 - Atrix HD - Droid Razr - Razr Maxx HD
Sony	- Xperia Tio
Google	- Nexus 4 - Nexus 5 - Nexus 7
HTC	- One X - Wildfire - Evo 3D
Kindle	- Kindle Fire 2 - Kindle Fire HD 8.9
Internet Explorer	- 6 - 7 - 8 - 9 - 10 - 11
Firefox	- 3.6 - 13 - 25
Chrome	- 14 - 20 - 31
Opera	- 11.6 - 12
Safari	- 5.1 Win - 5.1 - 6 - 7

Note: Yes I could have tested all 30 versions of Chrome, and all 25 versions of Firefox - but the headers change very rarely in these browsers, mostly from additions of new features such as WebP in Chrome.

The stuff all browsers agree on

Luckily, most of the basic stuff, all browsers managed to agree on. Every single browser correctly knew this request was a GET request. All browsers I tested made an HTTP/1.1 protocol request, although I don't even know which browsers wouldn't these days.

All browsers sent the following consistently
Header Name	Value
Request Method	GET
Request URI	/
Request protocol	HTTP/1.1
Referer
Host	localhost:3000
Conection	keep-alive

The stuff they don't

Of course, in a world of many browsers, getting them to agree on everything is impossible. It did throw up some interesting results…

DNT

DNT variations
Value	Platforms
1	Internet Explorer 10
(not present)	All others

DNT or "Do not Track" is a naive attempt at getting servers and their operators to stop tracking users browsing habits. If set to 1, the server should not track the browser. If set to 0, the server is welcome to. If not present, do whatever you want. Of course, ultimately the rule is actually "do whatever you want".

Microsoft famously announced that Internet Explorer 10 would come with DNT enabled by default. They then turned on a dime, and made it opt-in in IE 11 - mostly because advertisers started shouting and Microsoft and projects like Apache gave IE10's DNT the finger.

You can enable DNT in other browser since Firefox 9 and above, Safari 6, Opera 12 and Chrome 23.

TL;DR Conclusion

If the DNT header exists and is set to 1, then be nice and don't track people - otherwise we'll end up with another ludicrous law about it. If its not 1, or not set, go wild!

Accept-Charset

DNT variations
Value	Platforms
utf-8, iso-8859-1, utf-16, *;q=0.7	Android Stock Browser 2.2 & up tested
ISO-8859-1,utf-8;q=0.7,*;q=0.7	Firefox and Chrome on Windows
(not present)	All others

This is meant to be a way for browsers to tell you what charsets they can read, for back in the day when different OSes could only read a few charsets. The thing that bugs me about Accept-Charset is that in todays world, it is mostly redundant as UTF-8 (and sure, UTF-16) have become the universal computing charset (cue hoards of nerds and file-system geeks raging over that statement).

Firefox and Chrome on Windows headers can be translated to "Latin or UTF-8. If you can't do that, just give me whatever". While the Android Stock Browser is saying "UTF-8, Latin or UTF-16. If you can't do that, just give me whatever". To give you an idea of what the heck ISO-8859-1 is; sometimes referred to as "Latin" its an old crappy standard used pretty much just by Windows today, not really any better than ASCII. Basically not UTF-8. Yeah, cheers - UTF-8 it is!

Firefox is actually seeking to remove Accept-Charset as it can be a way to fingerprint browsers. Chrome is removing Accept-Charset too.

TL;DR Conclusion

Ignore it. Serve UTF-8. No one cares.

Accept-Language

Accept-Language variations
Value	Platforms
en-US,en;q=0.8	Chrome
en-us,en;q=0.5	Firefox
en,en-US;q=0.9,ar;q=0.8,ca;q=0.7,cs;q=0.6,da;q=0.5,nl;q=0.4,el;q=0.3,fi;q=0.2,fr;q=0.1	Opera 12
en-US	Android Stock Browser
en-us	All others

Accept-Language is your browser's guess at what languages you read. Most people will see these values, other locales see other stuff - so this header can be a bit… unpredictable. It uses ISO 639-1 country codes, and you should abide to it if you can.

Most of these examples have minor differences, except for Opera which I'll get to. Firefox and Chrome are saying "U.S English please, if you don't have that then generic English please". Android and iPhone are true patriots and simply saying "U.S English or nothin' thanks" (if you live in the UK then both of these are likely to actually be en-GB instead of en-US, which is fine).

Opera 12 then just gets weird on us. It says "Generic English please, or U.S English, if not then uh… Arabic! If not then perhaps Catalan? If not then Danish, or if not that then Dutch. Ok perhaps Greek? Finnish? Ok if you have none of those can I have it in French please". Go home Opera, you're drunk.

TL;DR Conclusion

If you have a multilingual site, then use the Accept-Language header to decipher the language, but take it with a pinch of salt. Always give the user the option to change the language manually though!

Accept-Encoding

Accept-Encoding variations
Value	Platforms
gzip	HTC Wildfire (android 2.2), HTC Evo (android 2.2)
gzip,deflate,sdch	Chrome, including Chrome on Android
gzip,deflate	Android Stock Browser, from atleast 4.0 and up
gzip, deflate	All others

Accept-Encoding is a pretty easy one, most browsers send the string "gzip, deflate" string, meaning "send it to me using gzip compression, or failing that send it to me using zlib compression". Some others send "gzip,deflate" (note, not using a space after the comma) - this is perfectly allowed as part of any Accept-* header.

Interestingly, the only two Android 2.2 devices tested sent back only "gzip" as their value. I'd go so far as to say perhaps older Android devices don't support Deflate. Quite frankly that's fine though because Deflate sucks compared to Gzip.

The other interesting part is Chrome - it has a new alternative to Gzip and Deflate, called SDCH. Chrome has actually had this since its first release, and it stands for Shared Dictionary Compression over HTTP. It's way out of scope of this article, but there is a nice article about SDCH from Adam Prescott. Having said that, don't use it - you can't if you're using HTTPS and you're better off investing your time getting SPDY to work.

Another bit of trivia here - Chrome also supported bzip2 from version 1-4, although dropped it in 4, I can't find out why. Also, Firefox plans to support lzma compression soon, which is even more bitchin' than gzip.

TL;DR Conclusion

If Accept-Encoding includes the string gzip, gzip it. Otherwise, send it plain text. Don't bother with Deflate or SDCH or bzip2 fancifulness as you're wasting your time. Perhaps consider maybe one day supporting lzma for some utopian future when it becomes a thing.

Accept (The biggie)

Accept-Encoding
Value	Platforms
text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8	All Others
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8	Chrome, including Chrome on Android
text/html, application/xhtml+xml, /	IE9+
text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, /;q=0.1	Opera 12
application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5	HTC Wildfire (android 2.2), HTC Evo (android 2.2)
image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, /	IE8
image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, /	IE6, IE7

Accept header is great… when it works. It's meant to specify what the browser can read, in order of preference (like the rest of the Accept-* headers). In theory, as a service that consumes an API, I could ask for application/json,application/xml;q=0.9 and get back a JSON feed of a page. This (as well as the other Accepts headers) is the crux of what is known as Content Negotiation and is one of the awesome fundamental principles of HTTP. I can even ask for cool parameters, for example application/json;indent=4 should give me JSON indented to 4 spaces, if I'm lucky, or application/json;indent=0 to have it all on one line. It's up to the server to support this, but its a cool feature if it does.

Also, these headers also change when the browser makes a request from an <img>, <video>, <audio>, <script> or <style> element. That sounds awful but it's pretty cool still. The Accepts header then becomes pertinent to the document it's requesting, which means you can do useful stuff, like serve up the best image, video or audio formats. Except most browsers just shove */* in and be done with it…

Most browsers do the right thing, and say "HTML, XHTML or XML please, otherwise whatever you've got". Chrome is one exception here, also specifying image/webp which is great, because as an emerging image format that not all browsers support - you can simply look out for that mimetype in the Accepts headers and boom, you're away.

I actually prefer the new IEs version of this though, it drops the application/xml bit - which is useful because application/xml usually means an RSS, ATOM or some other XML API, which is going to be significantly different from an HTML document (to the user). Also XML is awful. I wouldn't be sad to see application/xhtml+xml go either, because its not 2001 any more.

Then we go a bit into crazy town, I've put these in order of stupidity, as you can see - unsurprisingly - IE6-8 are right at the bottom of the pile. Opera 12 just adds loads of image formats, which isn't necessarily bad - but it does add additional bloat per request.

Old Android devices are where it starts getting annoying. They specify application/xml before text/html - meaning if you did true content negotiation, you'd hand it your XML document before your HTML document - which as stated already are very different things. This means your server can no longer happily serve whatever format the browser asks for, but it has to have its own priority list to say that HTML should be served before XML, if its included.

Then we come to IE8, 7 and 6. IE8 says "I'll have a gif, jpegs, p(rogressive) jpeg, XPS document, XAML document or XBAP document please, or just whatever". If you're wondering what the heck all those X thingies are, an XPS is Microsofts answer to PDF that fell on its face, XAML is the markup for SilverLight (Microsofts answer to Flash) and XBAP is XAML for Browser Applications. So basically "some images, and a bunch of failed Microsoft projects". IE 6/7 are no better, they just ask for application/x-ms-application (Microsoft ClickOnce) documents as well, which is a way of letting users install applications with one click. Yay security!

For the astute reader, you may have noticed that IE8 and below kind of… forgot… to negotiate for any actual web documents - like, oh I don't know - text/html. That's right, in the rampant brand proliferation of Microsoft brand products, older Microsoft browsers don't actually prefer to accept HTML documents, meaning if you want your server to allow for proper content negotiation, you have to user sniff for IE8 and below and prevent them from being dumb by injecting text/html, to the beginning of their Accepts header (and you thought writing server side code would free you from IE tomfoolery)! This does bring us nicely to…

User-Agent (The rant)

I'm not going to put the table here, because this is different for every browser. Instead this one becomes a mini rant. It's worth noting though that the user agent string is perhaps the most ridiculous demonstration of abuse ever conceived:

Every browser except for Opera claims it's Mozilla/5.0 (Opera calls itself Opera/9.80)
Chrome, Safari and Opera claim they're just like Firefox ((KHTML, like Gecko)) as does IE11 (like Gecko at the end). Firefox says "will the real Firefox please stand up" (Gecko/20100101).
Some Android devices state their device (Nexus 5 Build/KRT16M) where others just spout nonsense (sdk Build/MR1, real3d Build/FRG22D)
Usefully, Windows browsers will tell you if it is 64-bit with either WOW64 Win64; x64 so you can, y'know… optimise your HTML and CSS for 64-bits, that's a thing right?
Chrome claims it's both Chrome and Safari (Chrome/31.0.1650.57 Safari/537.36)
IE 11 changed their UA from including compatible; MSIE X.0 to now stating Trident/7.0; rv:11.0 because they were actively trying to trip up UA sniffers.
Android phones call themselves Linux; Android X.Y;. While technically yes, Android phones are Linux phones, I've yet to see an non Linux Android phone!
Lots of Android and iOS devices list their user agent as mobile, that includes Google Glass. Most tablets generally don't have mobile in their user agent string though.

This is what I like to call "the magical User Agent dance" where Browser vendors create a UA that "looks like" other browser UAs to fool the UA sniffers, and so the UA sniffers get more robust, so the browser vendors make them more convoluted and insane.

The thing is I still think UAs are useful for gathering analytics about your users, and knowing which devices to test as a priority, but rather than having the foolishness that is Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36 which is 108 bytes listing SIX different browser technologies, we could have something sane like windows/6.2;chrome/31.0.1650.57 which still conveys all the necessary information (nay, more), but in 31 bytes, and less stupid. Or the even more ridiculous 134 byte Mozilla/5.0 (Linux; Android 4.4; Nexus 5 Build/KRT16M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.59 Mobile Safari/537.36 could become android/4.4;chrome/31.0.1650.59, again, 31 bytes. Of course then browsers couldn't wear each other's skin like some kind of creepy Masquerade ball. Even cURL has a disappointingly stupid UA (curl/7.21.0 (x86_64-apple-darwin10.2.0) libcurl/7.21.0 OpenSSL/1.0.0a zlib/1.2.5 libidn/1.19).

Closing thoughts

Still there? Great. Hopefully you realise now, like I did, that some HTTP headers are useful, and let you do cool stuff - but they are fraught with issues and politics, which is incredibly disappointing.