URL aesthetics

Posted by lec** on Thursday, January 10 2008 @ 06:10:49 GMT        
articles-urlaesthetics
Keeping URLs logical and clean is a good practice
While you're working on a website, you are probably focused on the content and usability of it. You want the design to be pleasant, free of broken elements and such things. But don't forget the URLs! I suppose it's because some webmasters are so focused on making everything else that URLs are sometimes get neglected. The way you name your files and design your URLs will have a great impact on your site, since the URL (or rather, URI) is what will identify your document. Of course, the URL does have technical restrictions, and you may want to leave it imperfect in order to gain something. But making it ugly without any real reason (except your own laziness) is a very bad webmastering practice that you will regret. Remember these pointers, and you'll do well:

Keep capital letters out of the URLs
This is one huge mistake that I don't completely understand. It may have something to do with the fact that Windows platforms (thus windows servers too) ignore the case of a file name. On a windows server, requesting index.html, Index.html or iNdeX.HTmL make no difference whatsoever - the same file is delivered every time. However that does not mean that you should not give a damn. A non-technical audience probably does not know this distinction, and will have to make a considerable, completely unnecessary effort to relay the address of your "how to prepare octopus salad" article to another person over the phone if it's address contains Articles-other/Listing.aspx?articleID=1013, for instance. The Biodiversity Heritage Library about page capitalises the "A", making the link lead to About.aspx. Since it's a windows server, even if the file IS called About.aspx, you can still make the URL contain all lowercase characters to make it easier to remember. Those who run Microsoft.com seem to forget or ignore in simply too many instances to count. I have even seen this on non-windows servers, with files such as ContactUs.php, but thankfully the number of sites serving such pages is very small.

This of course also applies for querystrings. A lot of forum software such as Vanilla and even the hideously expensive Jive forums use capitalised querystrings, such as CommentID=x and threadID=x. There is absolutely no need for this. If you have to use a querystring, at least keep the parameters as short as possible, lowercase, and logical. vBulletin does it nicely: showthread.php?t=x. Whatever software MSDN forums run (I suspect Community Server) does it hideously: ShowForum.aspx?ForumID=536&SiteID=1! Yuckkkk!!. So in conclusion, there is never a need for you to use uppercase, regardless of which platform your server runs.

Avoid unnecessary querystrings
It's often a habit on smaller sites to be lazy and use the index page (usually index.php) to generate a header and footer to sandwich content between them - content specified by querystring. I've seen page.php?id=contact, index.php?do=contact, even index.php?page=Contact. All of these are very ugly and terribly clumsy on the client side. If you absolutely have to have URLs like these, at least rewrite them with .htaccess or something, because it's completely unacceptable.

Many sites today avoid the querystring like the plague, in the name of "SEO" - but often the replacement is even uglier than the querystringed version. Actually, search engines don't pay as much attention to the URL as the structure of the document, so make sure you use all the important tags properly (like header elements h1-6), the url is less important for search engines. Let's take a look at this one: /2007/12/06/new-version-of-some-product-released-2007-12-06-126013/. That cannot be any better than the original - let's analyse! You've got a year, month and a day in there - do you really need to specify the day? It seems highly unlikely to me that you will publish two articles having identical names (or name-like identifiers) in the same month, so why not have the year and the month. Whereas you might want to include the day just to make it clear when the article or whatever was written, that's surplus information that you do not want to challenge your visitors into remembering. You could have just as well have stuffed the whole article into the URL - then you wouldn't even need the page! It's about user friendliness, because web spiders and search engines won't mind if the URL looks like that, while your visitors may suffer.

Doing URL overwriting in general is a good thing, though I don't think it's always a compulsory measure. If you have a small querystring that does nothing other than provide crucial information to the underlying mechanism it's not that bad (eg. /topic?id=23). Of course, overwriting it to something that looks like /topic/23 would be even better. The querystring is a useful, though questionably search-engine unfriendly, URL component. So in short, it's always a good idea to overwrite your URLs, but, again, I can't stress this enough - give your users, not quasi-SEO-optimisation, the highest priority. It's positively easy to make URLs short, logical, easy to remember. And search engines will be happy too.

No unnecessary information
I've already noted this in the point above about querystrings. Any information that isn't strictly used to service the request does not belong in the url. Don't repeat yourself either! Myspace is a broken website in many regards, but the urls are close to criminal (look) - it's a typical example of too much redundant information in the URLs - if it's profile.myspace.com, why the heck is it necessary to specify fuseaction=user.viewprofile? It's obviously because a) the developer who designed them was a moron, or b) the URLs were decent in the start, but the development was messy and the current ones are a result of multiple hacky solutions. Whichever it is, MySpace definitely isn't the only site out there that puts stupid redundant things into URLs. Anything that repeats or is not necessary to complete the request should never make it into a URL. I've even seen a disk drive letter in the URL on one site, as a part of some ASP.NET thumbnailer that even received a height and width which did not affect the output.

Cgi-bin? Are you joking?
This is an occurrence that I feel is thankfully getting less and less common, though you will still notice some sites have directories called "cgi-bin". It was part of some weird crazy notion that compiled cgi binaries and perl scripts have to go in these folders. With a little configuration, most servers can serve these files from any directory, not necessarily a cgi-bin.

Though this also applies to all occurrences in the URL indicating some sort of underlying server-side processing mechanism. Folders named bin, isapi, php or similar (cgi-bin especially) are an unwelcome addition that mean nothing to the visitor, and even waste your bandwidth at a 3-7 byte per link rate. It is also completely unnecessary to have something I like to call "idiot directories" - basically directories called "site", "portfolio", "Member" (...the list goes on...) that you are redirected into when opening a web site - Microsoft's former attempt at a community site - Hive.net (now shut down) - redirected you to hive.net/Member/ when you opened hive.net. The software used was a customised version of Community Server, so I think they did have control over this aspect. It's ok to do this in only two situations - if your site is split cleanly into several sub-sites which you wish to strictly separate, or when you aren't really redirecting to a real directory, but to an overwritten URL that contains the default language being used (e.g. microsoft.com -> microsoft.com/en/us/default.aspx). Another possibility is a temporary redirect while you are doing some work, like redesigning, on the website but you don't want to take the whole site offline. Just make sure it really is temporary.

File extensions... meh
Removing the file extension of your web pages is a popular and justified action. Not only does it shorten the URL by several characters, but it also hides what scripting technology you may be using from plain view, thus letting you change it at any point without having to change the URLs. It's an important thing to do in my opinion, but of course it's sometimes not possible or not practical. Either way, if you have the opportunity, take it. Some servers can be configured to process files with no file extension differently, whereas in other cases, you can use .htaccess to remove the extension. A full overwrite of the extension and any querystring data is a wise choice, and will likely make search engines treat your content differently.

.html vs .htm
Some websites have URLs containing .htm extensions instead of .html, corresponding to the type of file. It could be argued that the shorter the extension, the better (no extension - perfect) but in this case, I'd like to state that it's an exception caused by personal taste. If you're making a choice, use .html instead of .htm. HTM files carry some negative "Windows 95/98, Internet Explorer 5" aura, that I would prefer not to have around my site, thank you very much.

I sincerely hope you will use these guidelines (they're mostly common sense, really) and benefit from them when creating your website. If you've read this far and have acquired a feel for carving URLs, I can give myself a pat on the back since another webmaster or webmistress may choose to adopt a more logical URL solution for their (otherwise lovely, I'm sure) website.
chrisl

chrisl's avatar
Dec 06 2009 @ 05:40:52
I use .xhtml instead of .html or .htm
On a local copy of a website, opening a .xhtml forces the browser to treat it like a application/xhtml xml

.htm or .html on the other hand, behave like text/html

of course, if you are using invalid xhtml (or html), i guess using an extension that gives you text/html could considered useful, but hell, just fix your markup to use correct xhtml ;)
SpaceMan

SpaceMan's avatar
Aug 28 2009 @ 04:46:44
On my site, I used .htaccess rewrite to rewrite Content/Home to index.php?page=home , it's easy
Sir Aaron

Sir Aaron's avatar
May 19 2009 @ 20:07:22
I agree with the capitol letters and .htm/.html thing.
Kat^

Kat's avatar
Feb 24 2008 @ 04:34:21
Very helpful. :D
Conventional Login
User:
Pass:

Don't have an account? You may want to create one.

OpenID Login
OpenID login and registration is usable, but not finished.
What is OpenID?
Search

(advanced search)
Site Stats
  Total members: 107
  Latest member: DarylJohn
  Members currently online: 0
  Most online: 5 - Aug 28, 2009 (21:49)
  Front page hits: 68291
Developer info
  Site version: 3.5 Alpha
  12 queries - 4 templates
Under the Spotlight
Collide Site
Collide make fabulously dreamy electronic-industrial music, they're one of my favourite bands! Give them a chance to take control of your life - myspace | youtube - "Euphoria".

Collide Site - Hits: 2549

5/5 (2) | Rate this site?
Sponsored Links