Posterous
Alex is using Posterous to post everything online. Shouldn't you?
3477155465_11ffc2743d_thumb
 

alex black’s blog

startup life in waterloo

30
Dec 2009

Be "of the internet"

In this post I want to persuade you to build web applications (or sites) that are "of the internet".  By this I mean applications that are built to best leverage the inherent properties of the internet, for example linking between sites, bookmarking pages, caching, crawling, viewing sites in different formats such as mobile or print etc.

Peter Sweeney at Primal Fusion opened my mind to this perspective.  For some people I'm sure its obvious :) But for me, a few years ago, it wasn't.  As a result, my efforts to develop our first internal web API at Primal Fusion were definitely not "of the internet". And for other people at Primal Fusion it wasn't obvious either, our first efforts at a consumer facing web application broke key features of the internet such as bookmarking and the browser back button.

What does it mean to be "of the internet"?  For me it means:

  • To embrace official internet standards.  All of the stuff that makes the web work, like HTTP, DNS, RSS etc, its all been written up as standards that you can read about.  Its amazing how much web development I did before ever reading one of these standards :)  See the HTTP/1.1 request for comments (RFC). 
  • Ensure accessability.  If your application runs on the internet, then consider your audience.  They're not all running Internet Explorer 8 on 1600x1200 monitors.  Some of them are running other browsers, some are search engines, some print your webpages, some are on mobile devices, and some are visually impaired.
  • To adopt and improve best practices of the internet.  For example, use CSS, don't hard code your font sizes preventing your grandma from increasing the text size so she can read your site, use good caching practices to improve your site's performance etc.  

Why should you care?  By developing applications that are of internet, rather than applications that break the internet, you:

  • Become a good net citizen
  • Improve the usability of your site
  • Improve the chances of google and other search engines properly indexing and ranking your site
  • Save money by taking advantage of existing protocols and systems rather than building your own, for example why invent your own authentication protocol when you can use HTTP Digest

I was spurred to write this post because of some difficulty I'm having right now trying to get our application to respond nicely to not found URLs.  We're developing our website in Lift, using Scala, and the recommended approach to handle page not founds (see here) is to redirect the user to a page telling them that the page they are looking for does not exist.  This is the wrong approach because 1) it changes the url on the user, preventing them from possibly correcting a mistake in the url or say copying it and emailing it to you, 2) it sends the wrong message to search engines, instead of letting them know this url is not found, you're telling them its just moved, and 3) it goes against the standards of the internet.

When a user-agent requests a page from your site, if you can't find the page you should respond with a 404 not found status code, rather than a 200 OK status code.  Read Google's opinion on the soft 404 issue.

Here are a number of best practices that stand out in my mind:

  • Support the browser back button. This is especially difficult when developing ajax applications, but if the user pushes the back button, they should return to the previous page they were at, if this doesn't work on your site, then its broken!
  • Support bookmarking and linking.  If someone can view a page on your site, they should be able to make a link to it or bookmark it.  On many sites, you can't do this, because the page being displayed is a result of some state that is not on the URL, so if you give the URL to someone else, they might see something different.
  • Use GET and POST appropriately. Only use GET for safe requests. Think about it, its called GET.  It should do nothing other than GET a webpage.  If you are modifying a shopping cart, or deleting a record in the database, it can't be done in a GET.
  • Develop RESTful APIs.  All of the above suggestions apply to web APIs they same way they apply to websites.  REST is an approach for APIs that embraces web standards.
  • Take advantage of caching.  A key principle of the web is that pages and responses can be cached.  Of course this only works well if you're already following web standards well.  If all your GET requests are safe (idempontent), then you can easily stick a web cache in front of your web server, and google and the user's web browser can also safely cache response to speed things up, and ultimately save you money in bandwidth and CPU time.

One last anecdote.  Years ago Google unveiled the Google Web Accelerator product,  The idea was they would improve the experience people had browsing the web by pre-caching pages for users.  One major problem they ran into was that many sites had not conformed to internet standards, and when google sent them GET requests for pages, these sites performed actions with side effects such as deleting data.  What they'd done was build their applications in such a way that when a user clicked a button on their site to say delete a product, the browser would send a GET request to the a url like http://yoursite.com/deleteProduct?product=1, when what they should have done is have the browser send a DELETE or POST request. GET is meant to be idempotent.

I've only touched the surface here, but I hope I've got across at a high level what it means to be of the internet, and I encourage you to make this perspective your own and use it to help guide your decisions when you're building your next internet application.

 

Dec 30, 2009
Tom said...
Alex, you know that it's technically impossible to support the back and forward buttons!
Dec 31, 2009
Alex Black said...
What about supporting just one of them at a time? :) I was reading "Coders at Work" last night, and Jamie Zawinski said that in order to support multiple platforms (he was referring to operating systems) you had to write the code to work on all of them from the start, that coming back later to add other platforms never worked. I think the same principle applies here, you have to support the back/forward buttons from the start, or you'll never catch up.

We're facing a couple questions like that at Snapsort right now, and I think the answer is you can't leave stuff like that until later, you need to build a complete system from the start, complete horizontally that is, and then add depth all the way across as you go.

Dec 31, 2009
Tom said...
Makes sense. Or you could use GWT which apparently has built-in support for this stuff now. All of Google Wave was built on GWT? Pretty impressive.
Dec 31, 2009
Alex Black said...
GWT does sound pretty good. More generally, you might just be saying: choose tools which make building applications that are 'of the internet' out of the box.
 
Got an account with one of these? Login here, or just enter your comment below.
Posterous-login    Connect    twitter