Be "of the internet"
In this post I want to persuade you to build web applications (or sites) that are "of the internet". By this I mean applications that are built to best leverage the inherent properties of the internet, for example linking between sites, bookmarking pages, caching, crawling, viewing sites in different formats such as mobile or print etc.
Peter Sweeney at Primal Fusion opened my mind to this perspective. For some people I'm sure its obvious :) But for me, a few years ago, it wasn't. As a result, my efforts to develop our first internal web API at Primal Fusion were definitely not "of the internet". And for other people at Primal Fusion it wasn't obvious either, our first efforts at a consumer facing web application broke key features of the internet such as bookmarking and the browser back button.
What does it mean to be "of the internet"? For me it means:
- To embrace official internet standards. All of the stuff that makes the web work, like HTTP, DNS, RSS etc, its all been written up as standards that you can read about. Its amazing how much web development I did before ever reading one of these standards :) See the HTTP/1.1 request for comments (RFC).
- Ensure accessability. If your application runs on the internet, then consider your audience. They're not all running Internet Explorer 8 on 1600x1200 monitors. Some of them are running other browsers, some are search engines, some print your webpages, some are on mobile devices, and some are visually impaired.
- To adopt and improve best practices of the internet. For example, use CSS, don't hard code your font sizes preventing your grandma from increasing the text size so she can read your site, use good caching practices to improve your site's performance etc.
Why should you care? By developing applications that are of internet, rather than applications that break the internet, you:
- Become a good net citizen
- Improve the usability of your site
- Improve the chances of google and other search engines properly indexing and ranking your site
- Save money by taking advantage of existing protocols and systems rather than building your own, for example why invent your own authentication protocol when you can use HTTP Digest
I was spurred to write this post because of some difficulty I'm having right now trying to get our application to respond nicely to not found URLs. We're developing our website in Lift, using Scala, and the recommended approach to handle page not founds (see here) is to redirect the user to a page telling them that the page they are looking for does not exist. This is the wrong approach because 1) it changes the url on the user, preventing them from possibly correcting a mistake in the url or say copying it and emailing it to you, 2) it sends the wrong message to search engines, instead of letting them know this url is not found, you're telling them its just moved, and 3) it goes against the standards of the internet.
When a user-agent requests a page from your site, if you can't find the page you should respond with a 404 not found status code, rather than a 200 OK status code. Read Google's opinion on the soft 404 issue.
Here are a number of best practices that stand out in my mind:
- Support the browser back button. This is especially difficult when developing ajax applications, but if the user pushes the back button, they should return to the previous page they were at, if this doesn't work on your site, then its broken!
- Support bookmarking and linking. If someone can view a page on your site, they should be able to make a link to it or bookmark it. On many sites, you can't do this, because the page being displayed is a result of some state that is not on the URL, so if you give the URL to someone else, they might see something different.
- Use GET and POST appropriately. Only use GET for safe requests. Think about it, its called GET. It should do nothing other than GET a webpage. If you are modifying a shopping cart, or deleting a record in the database, it can't be done in a GET.
- Develop RESTful APIs. All of the above suggestions apply to web APIs they same way they apply to websites. REST is an approach for APIs that embraces web standards.
- Take advantage of caching. A key principle of the web is that pages and responses can be cached. Of course this only works well if you're already following web standards well. If all your GET requests are safe (idempontent), then you can easily stick a web cache in front of your web server, and google and the user's web browser can also safely cache response to speed things up, and ultimately save you money in bandwidth and CPU time.
One last anecdote. Years ago Google unveiled the Google Web Accelerator product, The idea was they would improve the experience people had browsing the web by pre-caching pages for users. One major problem they ran into was that many sites had not conformed to internet standards, and when google sent them GET requests for pages, these sites performed actions with side effects such as deleting data. What they'd done was build their applications in such a way that when a user clicked a button on their site to say delete a product, the browser would send a GET request to the a url like http://yoursite.com/deleteProduct?product=1, when what they should have done is have the browser send a DELETE or POST request. GET is meant to be idempotent.
I've only touched the surface here, but I hope I've got across at a high level what it means to be of the internet, and I encourage you to make this perspective your own and use it to help guide your decisions when you're building your next internet application.

