About URLs

The URL standard is a way to address resources on the Internet, where a resource can be (among other things) a file or directory-listing anywhere on the Internet or in the disk drives of the computer you're working on.

Roughly speaking, a URL specififies a file in any format (HTML, a picture file, an ASCII file, anything) by naming the machine it's on, the subdirectories it's in, and the name it has; a file can be obtained thru any protocol (http, gopher, anonymous ftp).

Here are some examples and explanations:

http://www.csun.edu/~sburke1/sburke1.html
Using the http protocol, get the document sburke1.html in the public_html directory of user sburke1, on the Internet machine www.csun.edu
gopher://netcom.com/0info.txt
This refers to the document, retrieved using the gopher protocol, designated by gopher selector string 0info.txt in the gopherroot directory of the Internet machine netcom.com Gopher URLs can be kind of peculiar, and there a few things you might want to know about them.
mailto:sburke@cpan.org
This designates an Internet email address to send mail to
news:alt.rec.gardening
The is current contents (relative to your news server) of a fascinating Usenet newsgroup
telnet://huey.csun.edu:23/
The name of a machine to log into. (23 being the port number)
mailto:joe%3dbob%25briggs@chainsaw.int,fifi@god.org
A URL with this link, if followed, will send email to the addresses joe=bob%briggs@chainsaw.int and fifi@god.org. Note that the % and = are %-encoded; it's very very unlikely you'll ever have to worry about %-encoding. Note that there is no space between the addresses; this is because there are never any spaces in URLs, ever ever ever.
Since mailto's are not universally supported, it is bad form to say <A HREF="mailto:me@nowhere.org">mail me, dood!</A>. Instead, say mail me, dood, at <A HREF="mailto:me@nowhere.org">me@nowhere.org</A> so people who can't (or don't want to) mail you thru their browser can copy your address from the browser window (or real-world store it on paper, post-it, Lite-Brite, Magic Slate, whatever) and paste it into their mailer program.
http://www.csun.edu/ITR/library/images/dude2.gif
This designates a GIF image file.
ftp://ftp.netcom.com/~eamon/test1.wav
This designates a WAV sound file.
file://localhost/c|/netscape/bookmark.htm
This local URL just may designate the bookmark file on whatever system you're using. See "About local URLs".
Now, a resource (a nice vague term, the "r" in URL) that a URL designates need not be a file per se, but can be a directory listing of files, output from a CGI script, any number of MIME objects (theoretically), or the designation of one or more Internet email addresses (as you saw above), or the designation of a telnet session (as you saw above) or tn3270 session.

Two other, esoteric, protocol schemes -- wais:// and prospero:// -- are supported in theory, but not in practice. No one seems to mind this.

The URL form news:newsgroup is popular, although not yet universally adopted; many browers cannot support it, and others prefer/require different syntaxes.

Adding finger:address@host was suggested, but is not supported by any client I know. However, there is a very elegant workaround.

The final slash

Directory names accessed thru http should end in a /

Example:

http://www.csun.edu/ITR/
(The server may respond with either a directory listing, or the contents of the directory blocker file for that directory.)
I am often asked, is the final slash after the directory specification optional? Answer -- yes and no. If you jump to http://www.ling.nwu.edu/~sburke, the server will respond with a redirect code meaning "What you actually want is to be found at http://www.ling.nwu.edu/~sburke/". Your client will then ask for http://www.ling.nwu.edu/~sburke/, at which point the server will respond with the directory blocker file for that directory, or lacking such a file, a listing of all the files in that directory.

If you had told your client to jump to http://www.ling.nwu.edu/~sburke/ in the first place, it would have saved the useless initial exchange of packets. This may seem trivial if the server is 500 feet away from your client across a clear Ethernet network, but if you're in the USA and the server is in China, every packet can be a long wait.

What this means is that if you're making links to other people's pages in a document of yours, don't abbreviate the URLs by leaving the final /'s off the directory names; put them in.

Of course, this is not relevent if the URL ends in a filename (e.g., foo.html), nor is it relevent for ftp or gopher URLs.

Anchors

A link can also be to an anchor partway thru an HTML document: You can leave off the protocol and directory specifications, if the file you're linking to is in the same directory as the file that's linking to it: A URL can also be a link to elsewhere in the current document. The URL "#stuff" jumps you to a where in the current document there is a <A NAME="stuff">... </A> (BTW, if you want text to be a link and and anchor, do it like this: <A NAME="stuff" HREF="foo.html">... </A>

These are called relative URLs. Relative URLs and anchors are very well explained in Beginner's Guide to URLs and the Strict Guide's section on URL Errors. (So I don't have to do it here, yay!) Refer to these documents.

The first word on URLs

The intial and pretty much complete specification of the URL scheme is to be found in RFC-1738.

I refer to it often.


[Back to main page]
28 Jan 1996