2.1 Writing Tools
Use any text editor to create an HTML or XHTML document, as long as it can save your work on disk in ASCII text file format. That's because even though documents include elaborate text layout and pictures, they're all just plain old ASCII text documents themselves. A fancier WYSIWYG editor or a translator for your favorite word processor are fine, too — although they may not support the many nonstandard features we discuss later in this book. You'll probably end up touching up the source text they produce, in any case.
While it's not needed to compose documents, you should have at least one version of a popular browser installed on your computer to view your work, preferably Netscape Navigator or Microsoft Internet Explorer. That's because, unless you use a special editor, the source document you compose won't look anything like what gets displayed by a browser, even though it's the same document. Make sure what your readers actually see is what you intended by viewing the document yourself with a browser. Besides, the popular ones are free over the Internet.
Also note that you don't need a connection to the Internet or the Web to write and view your HTML or XHTML documents. You can compose and view your documents stored on a hard drive or floppy disk that's attached to your computer. You can even navigate among your local documents with the languages' hyperlinking capabilities without ever being connected to the Internet, or any other network, for that matter. In fact, we recommend that you work locally to develop and thoroughly test your documents before you share them with others.
We strongly recommend, however, that you do get a connection to the Internet if you are serious about composing your own documents. You can download and view others' interesting web pages and see how they accomplished some interesting feature — good or bad. Learning by example is fun, too. (Reusing others' work, on the other hand, is often questionable, if not downright illegal.) An Internet connection is essential if you include in your work hyperlinks to other documents on the Internet.
2.2 A First HTML Document
It seems every programming language book ever written starts off with a simple example on how to display the message, "Hello, World!" Well, you won't see a "Hello, World!" example in this book. After all, this is a style guide for the new millennium. Instead, ours sends greetings to the World Wide Web:
Go ahead: type in the example HTML source on a fresh word-processing page and save it on your local disk as myfirst.html. Make sure you select to save it in ASCII format; word processor-specific file formats like Microsoft Word's .doc files save hidden characters that can confuse the browser software and disrupt your HTML document's display.
After saving myfirst.html (or myfirst.htm, if you are using archaic DOS- or Windows 3.11-based file-naming conventions) onto disk, start up your browser and locate and open the file from the program's File menu. Your screen should look like Figure 2-1.
Figure 2-1. A very simple HTML document
<title>My first HTML document</title>
<h2>My first HTML document</h2>
Hello, <i>World Wide Web!</i>
<!-- No "Hello, World" for us -->
<a href="http://www.ora.com">O'Reilly & Associates</a>
Composed with care by:
<cite>(insert your name here)</cite>
<br>©2000 and beyond
2.3.1 Start and End Tags
Most tags define and affect a discrete region of your document. The region begins where the tag and its attributes first appear in the source document (a.k.a. the start tag ) and continues until a corresponding end tag. An end tag is the tag's name preceded by a forward slash (/ ). For example, the end tag that matches the "start italicizing" <i> tag is </i>.
End tags never include attributes. In HTML, most tags, but not all, have an end tag. And, to make life a bit easier for HTML authors, the browser software often infers an end tag from surrounding and obvious context, so you needn't explicitly include some end tags in your source HTML document. (We tell you which are optional and which are never omitted when we describe each tag in later chapters.) Our simple example is missing an end tag that is so commonly inferred and hence not included in the source that some veteran HTML authors don't even know that it exists. Which one?
The XHTML standard is much more rigid, insisting that all tags have corresponding end tags. [Section 16.3.2] [Section 16.3.3]
2.4 HTML Skeleton
Notice, too, that our simple example HTML document starts and ends with <html> and </html> tags. These tags tell the browser that the entire document is composed in HTML. The HTML and XHTML standards require an <html> tag for compliant documents, but most browsers can detect and properly display HTML encoding in a text document that's missing this outermost structural tag. [<html>]
 XHTML documents also begin with the <html> tag, but they contain additional information to differentiate them from common HTML documents. See Chapter 16 for details.
<h3 align=right><a href="#21">top</a>&ly</h3>
Like our example, all HTML and XHTML documents have two main structures: a head and a body, each bounded in the source by respectively named start and end tags. You put information about the document in the head and the contents you want displayed in the browser's window inside the body. Except in rare cases, you'll spend most of your time working on your document's body content.  
A raw document with all its embedded tags can quickly become nearly unreadable, like computer-programming source code. We strongly recommend that you use comments to guide your composing eye.
Although it's part of your document, nothing in a comment, which goes between the special starting tag <!-- and ending tag --> comment delimiters, gets included in the browser display of your document. You see a comment in the source, as in our simple HTML example, but you don't see it on the display, as evidenced by our comment's absence in Figure 2-1. Anyone can download the source text of your documents and read the comments, though, so be careful what you write. [Section 3.5.3]
There are several different document header tags that you can use to define how a particular document fits into a document collection and into the larger scheme of the Web. Some nonstandard header tags even animate your document.
For most documents, however, the important header element is the title. Standards require that every HTML and XHTML document have a title, even though the currently popular browsers don't enforce that rule. Choose a meaningful title, one that instantly tells the reader what the document is about. Enclose yours, as we do for the title of our example, between the <title> and </title> tags in your document's header. The popular browsers typically display the title at the top of the document's window. [<title>]
184.108.40.206 Divisions, paragraphs, and line breaks
A browser takes the text in the body of your document and "flows" it onto the computer screen, disregarding any common carriage-return or line-feed characters in the source. The browser fills as much of each line of the display window as possible, beginning flush against the left margin, before stopping after the rightmost word and moving on to the next line. Resize the browser window, and the text reflows to fill the new space, indicating HTML's inherent flexibility.
Of course, readers would rebel if your text just ran on and on, so HTML and XHTML provide both explicit and implicit ways to control the basic structure of your document. The most rudimentary and common ways are with the division (<div>), paragraph (<p>), and line-break (<br>) tags. All break the text flow, which consequently restarts on a new line. The differences are that the <div> and <p> tags define an elemental region of the document and text, respectively, the contents of which you may specially align within the browser window, apply text styles to, and alter with other block-related features.
Without special alignment attributes, the <div> and <br> tags simply break a line of text and place subsequent characters on the next line. The <p> tag adds more vertical space after the line break than either the <div> or <br> tags. [Section 4.1.1] [Section 4.1.2] [Section 4.6.1]
By the way, the HTML standard includes end tags for the paragraph and division tags, but not for the line-break tag. Few authors ever include the paragraph end tag in their documents; the browser usually can figure out where one paragraph ends and another begins. Give yourself a star if you knew that </p> even exists.
 With XHTML, <br>'s start and end are between the same brackets: <br />. Browsers tend to be very forgiving and often ignore extraneous things, such as the forward slash in this case, so it's perfectly okay to get into the habit of adding that end-mark.
 The paragraph end tag is being used more commonly now that the popular browsers support the paragraph-alignment attribute.
Besides breaking your text into divisions and paragraphs, you can also organize your documents into sections with headings. Just as they do on this and other pages in this printed book, headings not only divide and entitle discrete passages of text, they also convey meaning visually. And headings readily lend themselves to machine-automated processing of your documents.
There are six heading tags, <h1> through <h6>, with corresponding end tags. Typically, the browser displays their contents in, respectively, very large to very small font sizes, and usually in boldface. The text inside the <h4> tag typically is the same size as the regular text. [Section 4.2.1]
The heading tags also break the current text flow, standing alone on lines and separated from surrounding text, even though there aren't any explicit paragraph or line-break tags before or after a heading.
220.127.116.11 Horizontal rules
Besides headings, HTML and XHTML provide horizontal rule lines that help delineate and separate the sections of your document.
When the browser encounters an <hr> tag in your document, it breaks the flow of text and draws a line across the display window on a new line. The flow of text resumes immediately below the rule. [Section 5.1.1]
 Similar to <br>, with XHTML the formal horizontal rule tag is <hr />.
While text may be the meat and bones of an HTML or XHTML document, the heart is hypertext. Hypertext gives users the ability to retrieve and display a different document in their own or someone else's collection simply by a click of the keyboard or mouse on an associated word or phrase (hyperlink ) in the document. Use these interactive hyperlinks to help readers easily navigate and find information in your own or others' collections of otherwise separate documents in a variety of formats, including multimedia, HTML, XHTML, other XML, and plain ASCII text. Hyperlinks literally bring the wealth of knowledge on the whole Internet to the tip of the mouse pointer.
To include a hyperlink to some other document in your own collection or on a server in Timbuktu, all you need to know is the document's unique address and how to drop an anchor into your document.
While it is hard to believe, given the millions, perhaps billions, of them out there, every document and resource on the Internet has a unique address, known as its uniform resource locator (URL; commonly pronounced "you-are-ell"). A URL consists of the document's name preceded by the hierarchy of directory names in which the file is stored (pathname), the Internet domain name of the server that hosts the file, and the software and manner by which the browser and the document's host server communicate to exchange the document (protocol ):
Here are some sample URLs:
The first example is an absolute or complete URL. It includes every part of the URL format: protocol, server, and the pathname of the document. While absolute URLs leave nothing to the imagination, they can lead to big headaches when you move documents to another directory or server. Fortunately, browsers also let you use relative URLs and automatically fill in any missing portions with respective parts from the current document's base URL. The second example is the simplest relative URL of all; with it, the browser assumes that the price_list.html document is located on the same server, in the same directory as the current document, and uses the same network protocol.
Relative URLs are also useful if you don't know a directory or document's name. The third URL example, for instance, points to kumquat.com's web home page. It leaves it up to the kumquat server to decide what file to send along. Typically, the server delivers the first file in the directory, one named index.html, or simply a listing of the directory's contents.
Although appearances may deceive, the last FTP example URL actually is absolute; it points directly at the contents of the /pub directory.
The anchor (<a>) tag is the HTML/XHTML feature for defining both the source and the destination of a hyperlink. You'll most often see and use the <a> tag with its href attribute to define a source hyperlink. The value of the href attribute is the URL of the destination.
 The nomenclature here is a bit unfortunate: the "anchor" tag should mark just a destination, not the jumping-off point of a hyperlink, too. You "drop anchor"; you don't jump off one. We won't even mention the atrociously confusing terminology the W3C uses for the various parts of a hyperlink, except to say that someone got things all "bass ackwards."
The contents of the source <a> tag — the words and/or images between it and its </a> end tag — is the portion of the document that is specially activated in the browser display and that users select to take a hyperlink. These anchor contents usually look different from the surrounding content (text in a different color or underlined, images with specially colored borders, or other effects), and the mouse-pointer icon changes when passed over them. The <a> tag contents, therefore, should be text or an image (icons are great) that explicitly or intuitively tells users where the hyperlink will take them. [Section 6.3.1]
For instance, the browser will specially display and change the mouse pointer when it passes over the "Kumquat Archive" text in the following example:
For more information on kumquats, visit our
If the user clicks the mouse button on that text, the browser automatically retrieves from the server www.kumquat.com a web (http:) page named archive.html, then displays it for the user.