Sunday, September 20, 2009

Accept Headers: In The Wild

As web agents (including browsers) become more diverse there is an increasing need to distinguish between their types. The User-Agent header can be used for this task, but requires the server to know in advance all the possible agents and what type they are. This is not possible as both the diversity and quantity of agents is growing too quickly for any single registry to track.

According to the HTTP specification, the Accept header can be used to determine the type of agent. For example:
• HTML browsers should include "text/html" within the Accept header,
• XHTML browsers include "application/html+xml",
• RDF browsers include "application/rdf+xml",
• XSLT agents include "application/xml",
• PDF agents include "application/pdf",
• Office suites include "application/x-ms-application" or "application/vnd.oasis.opendocument", and
• JavaScript libraries include "application/json"

This allows the server to better redirect the agent to an appropriate resource.

Obviously, if a service will only serve HTML browsers, the type of agent is not necessary, as is the case in the Web 1.0 days when everything on the Web was HTML. However, as HTTP is becoming a more popular protocol for non-HTML communication, the need for distinguishing between types of agents is becoming important.

Consider the situation when an abstract information resource (like an order or an account) is identified by a URL. When the server receives a request for an abstract information resource, it needs to know which type of agent is requesting it, so it can better redirect the agent to an appropriate representation. If the agent is an HTML browser, the server should redirect to an html page displaying the order or account information; if a JavaScript library, the server should redirect to a json dump of the order/account summary; if a PDF agent, the server should redirect to a order/account summary report; if an office suite, the server should redirect to a spreadsheet of the details.

This works very well in theory, but because the Web was built with only HTML browsers in mind, most browsers don't properly implement the HTTP specification (because they don't have to). Even worse is that most non-HTML browser agents either don't include an Accept Header at all or use */* and say nothing about the type of agent. Below are some of the default accept headers from popular user agents on the web.

FF3.5 is an HTML and XHTML browser first, XML/XSLT agent second
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

IE8 is a media viewer (apparently)
image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, */*

IE8+office is a media viewer and office suite
image/gif, image/jpeg, image/pjpeg, application/x-ms-application,
application/vnd.ms-xpsdocument, application/xaml+xml,
application/x-ms-xbap, application/x-shockwave-flash,
application/x-silverlight-2-b2, application/x-silverlight,
application/vnd.ms-excel, application/vnd.ms-powerpoint,
application/msword, */*

Chrome3 is an XHTML and XML/XSLT agent first, HTML browser second, and text viewer third.
application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Safari3 is an XHTML and XML/XSLT agent first, HTML browser second, and text viewer third.
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Opera10 is an HTML and XHTML browser first, XML/XSLT agent second.
text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1

The MSN bot is an HTML browser, text viewer, xml client and application archiver.
text/html, text/plain, text/xml, application/*, Model/vnd.dwf, drawing/x-dwf

Google search bot is a jack of all agents, master of none
*/*

Yahoo search bot is a jack of all agents, master of none
*/*

AppleSyndication is a jack of all agents, master of none
*/*

See Also:
Unacceptable Browser HTTP Accept Headers (Yes, You Safari and Internet Explorer)
WebKit Team Admits Error, Downplays Importance, Re: 'Unacceptable Browser HTTP Accept Headers'

Reblog this post [with Zemanta]

No comments:

Post a Comment