Thursday, February 10, 2011

HTML5 in the browser: HTML5 data communications


Cross-document messaging, WebSockets, and other HTML5 APIs bolster website and browser interactivity to create a faster, richer Web
From the beginning, Web users have had mixed feelings about the way their browser communicates. On one hand, the idea of a tightly controlled sandbox is appealing because it limits the damage a website may do to our personal data and to the Web as a whole. Without these controls, just clicking on a link could unleash viruses, worms, and worse.

On the other hand, programmers have always complained about the browser's restrictions, pointing out the ways they limit the services that might be made available. Every AJAX developer can easily identify one way they could make their code that much cooler and more awesome if only the browser would loosen the rules governing the sandbox, but just this once and only for their code.

HTML5 is here to change this view toward communication -- radically in some ways and slightly in others. The rules for communication are changing, and in most cases, the developers are getting their wish. The limits are loosening but with enough strictures intact to provide greater flexibility without really endangering anyone.
The models should be familiar to most programmers because they're largely extensions of ideas that are common and generally successful in other parts of the stack. Most developers of user interfaces, for instance, arrange for the buttons and sliders to send events back and forth to other parts of the code called listeners. The HTML5 team extended this idea by arranging for code from different websites to tunnel through the wall between the different sandboxes that would normally prevent them from communicating. The sandboxes aren't being merged; the browser is simply offering a tunnel that's used only if both sandboxes agree to communicate.
All the specs have a similar flavor. The old idea of forcing the code to live in a sandbox isn't going away. The specs are just grafting on ways to use traditional approaches to break the difficult rules in a few simple and well-confined approaches. The sandboxes are growing well-guarded tentacles that link them to each other.
Cutting through complexity
The benefits to this should be obvious. Programmers have crafted a number of hacks to work around cross-site scripting and cross-site information fetching, and these add to the programming complexity and the network traffic. Many websites host proxies simply to get around these issues. The new HTML specs will let savvy programmers slice away at the layers with a machete, adding speed while cleaning up the code base.
The first place we'll start to see these new technologies appear will probably be in advertising. The most creative attempts to get us to part with our money already link together disparate parts of the pages. When the different blocks can communicate, these attempts will be even more clever. The aggregation sites that glue together widgets, RSS feeds, and other chunks of data will probably be next, although it's not clear that these need to offer the different building blocks a chance to communicate.
A good place to look for a preview might be the Yahoo Pipes site, which shows how people are creating interesting mashups from different feeds. The Yahoo site, of course, does most of the work on the server, whereas the HTML5 specs allow the client to take over some of these chores. Yahoo Pipes is filled with mashups that link RSS feeds to maps and other services. One, for instance, looks up the latest reviews from a movie site, then searches out the trailers from another.
Here is a tour of the major new HTML5 specs for opening up the communication between the layers.
From Web docs to Web apps
The original idea of the Web document was just that: a document with words and pictures in a rectangle. Sometimes the rectangles were subdivided into smaller rectangles with slightly different coloring, but still filled with words and pictures. It was an easy model when the job was merely distributing words and pictures.
After AJAX became popular and the document turned into software, some people wanted the rectangles to talk with each other. That was fine when all of the rectangles came from one source, but this dialog failed when the rectangles originated from different sources, as they do on most sites with advertising sold by companies like Google. The rectangles filled with content come from one site, and the rectangles filled with ads come from another.
The problem became even more confusing as Web developers started swapping widgets that made it easy for one website to include a small rectangle with content from another website. A blog, for instance, might want to include a widget with a weather forecast, movie times, sports scores, or all three.
The original idea of the Web was that the words, images, and JavaScript code from one source shouldn't fraternize with those from another source. Naturally, the website developers started to say, "This would be so much cooler if the widgets could talk to each other." Then the weather widget could display the forecast for the games in the sports widget or something similar.
The only solution was to run through some central server, but that was always a bit of a kludge. In many cases, it was purely impossible because the cross-domain scripting limitations imposed by the browser blocked the contact. AJAX programmers figured out more and better solutions, like loading JavaScript from all of the widget sites, but they are still kludges and not very secure at all.
Cross-document messaging
The HTML5 team is pushing the idea of cross-document messaging or communications. This would let the different rectangles set up a communications path by creating listeners that wait dutifully for message events from other rectangles. There's no need to run through a central server. The code just packs up the messages and sends them to some listener. It's a little like one neighbor talking to another by tying the message to a rock and throwing it over a fence. Good fences make good neighbors

The API does arrange for communication to be limited to specific domains. It's not possible to broadcast messages to all listeners who might want to receive them. They need to be targeted at windows of particular documents. For sustained communications, a specific channel can be created to act like a two-way pipe.
The details aren't final by any means, and programmers should be wary. Although Chrome, Firefox, IE, Opera, and Safari all implement the feature and let you create the listener objects, the draft of the API spec contains a warning: "Implementors should be aware that this specification is not stable."
At this point, the paradigm is well understood because many UI programmers use a similar model to structure the way that an application communicates with itself and its many parts. The changes probably won't affect the basic model of listener and event, but the details are still being worked out. To see how your browser handles cross-document messaging, point it to my cross-document messaging test page.

Cross-origin resource sharing
Sending messages is not the only solution for sharing information between different websites. The cross-origin resource sharing API loosens the controls over AJAX calls to anywhere but the home domain. A website can specify a list of allowable targets, and the XMLHttpRequest calls will just work -- at least, they should.
The information is bundled in the header of the document, which places it a bit out of reach of the average HTML coder.
Any website that receives this will be able to put and delete data from InfoWorld.com for all of 10,000 seconds. The original website, in essence, is giving the software the permission to call up someone else for extra data. The deadline may be useful for closing out sessions and blocking access when people inadvertently leave windows open.
WebSockets
When AJAX calls take a long time to complete, they traditionally fail with a time-out. This may be acceptable for basic tasks like collecting the latest headlines, but the eventual time-out makes it a bit trickier to implement interactive websites. Developers have traditionally worked around the problem by polling the server frequently.
The WebSocket API is an attempt to avoid all of the constant browser reconnections by digging deeper into the TCP stack to allow connections that stay up waiting for information to return. When the WebSocket object is implemented, functions are created for listening for new data with the onmessage field. Data can also be sent to the server when necessary.
Many of the browsers -- including Chrome, Firefox, Opera, and Safari -- already support WebSockets, but this doesn't mean it's easy to turn them on and start doing wonderful things. The servers must also be upgraded, and this is a topic for an entirely different article. Kaazing, for instance, is one Web server with a JavaScript engine for operating on the incoming communications.
Do the connections really stay open? Ha! This is the Web we're talking about, a world where ISPs still routinely promise "unlimited" data transfer over 25Mbps links -- honest, promise. Programmers need to assume that the connection will fail from time to time, even though it will stay open long enough to save the need for constant polling.

Then there's the security issue. Both Google and Mozilla decided to shut off their implementation of WebSockets for the time being after researchers Lin-Shung Huang, Eric Y. Chen, Adam Barth, Eric Rescorla, and Collin Jackson found it was possible to fool the browser into caching fake data [PDF]. They propose a more sophisticated mechanism, and the browser developers seem to be sticking with the idea. The code in Firefox, for instance, still works if you flip a secret configuration bit so that WebSockets can be used for testing. Once everyone regains their confidence, the window.websocket object will magically reappear.
Server-sent events
The number of options available to the modern HTML5 programmer can be a bit daunting. If an XMLHttpRequest fetches information from the server and WebSockets carry data in both directions, you might wonder if there's a way for the server to send information unilaterally. Naturally, there is such a plan, called server-sent events.
There's not much to the code. First, create an EventSource object pointing to the domain. Second, register a function to process the events if and when they arrive. There's no need to set up an open socket or to constantly poll a distant server. It's a spec that could save some battery power on handhelds.
A faster, simpler Web
All of these ideas for richer communications among websites and browsers should be both familiar and attractive to both developers and the ISPs. They reduce the need for extraneous message passing, and this alone should help cut down on some of the traffic on the Internet. Websites will seem a bit zippier.
However, the question of security still lingers. To most developers, the new specs should seem like baby steps that the browsers began taking long ago. What could possibly go wrong? The browser teams already shut down the WebSockets feature after some smart scientists found a sophisticated way to abuse it. The ideas may seem simple, but the implementations may have mistakes.
Such pitfalls raise the question about how much users can do about these new features. Unlike some of the newer ideas with a fancy HTML5 logo, most browsers don't offer the standard user any way to turn these communications features on or off. It may be possible to check on the number and size of local databases that a website is setting up -- another feature often considered part of HTML5 -- but there's no easy way to open up a preferences box and flip switches on any of these data communications features.
This will probably change if someone starts abusing them. There was no way to control pop-up windows until the advertisers made an annoyance of them -- then the pop-up blockers appeared. Will such controls be necessary? Although the standards seem simple and well thought through, hackers are surprisingly clever at chaining together several tiny slipups and turning them into a gaping hole. It may be good for everyone to adopt these ideas slowly while understanding the potential danger for fraud and malfeasance.
Note: This is the third article in a series devoted to the new features of HTML5. The first article, "HTML5 in the browser: Canvas, video, audio, and graphics," examined display options, including the <canvas> and <video> tags, Scalable Vector Graphics, and WebGL. The second article, "HTML5 in the browser: Local data storage," examined Web Storage, Web Database, and other APIs designed to transform Web pages into local applications. The next article will examine HTML5 forms.