So what, exactly, is XML? Its proper name is the Extensible Markup Language, abbreviated XML, and you use it to format and transfer data in an easy and consistent way (usually on the World Wide Web). XML is actually a specially designed subset of SGML (Standard Generalized Markup Language) simplified and targeted at the Web. HTML is another subset of SGML[md]in fact, if you're familiar with HTML (Hypertext Markup Language), you already have a leg up on XML. For example, take a look at this HTML Web page:
The Best Web Page Ever!
This is the Best Web Page Ever!
Welcome to the best Web page ever!
Here, we use HTML tags to specify just how we want the text in the Web page displayed:
Markup Tags
Tags in both HTML and XML are text strings enclosed in angle brackets: < and >, and they are directives to the application reading in the HTML or XML document. For example, the first tag in the above Web page is :
<--
.
.
.
This informs the Web browser that the document is written in HTML and so is a proper Web page. Next, we set up the Web page header with the and the tag:
<--
The Best Web Page Ever! <--
<--
.
.
.
This sets the title of the Web page, as displayed in the browser (usually in the browser's title bar at the top); in this case, the Web page title is the (somewhat grandiose) title, "The Best Web Page Ever!".
-For each start tag, like or , in this example, there is also an end tag, like or . End tags enclose the same name as start tags with an added forward slash, "/"; for exampke, for the start tag , the end tag is . In this way, pairs of start and end tags can enclose regions of the Web page text, specifying how we want that text displayed.
Note: Not all start tags require end tags, but most do.
For example, in the body of the Web page (as specified with the tag), we center text with the tag:
The Best Web Page Ever!
<--
This is the Best Web Page Ever!
<--
.
.
.
In addition, we display the page's title as a header with large font using the tag (HTML headers go from , the largest, to , the smallest), and make it bold with the tag:
The Best Web Page Ever!
<--
This is the Best Web Page Ever! <--
<--
.
.
.
In this way, we use HTML to specify how to display the text and graphics in our Web page.
-XML is different, although not completely different. Although HTML describes how to display the data in a Web page, we use XML to describe the data itself.
-In other words, XML is most often used as a data-description language, allowing us to organize data into data structures[md]even complex data structures, if we so choose. You can tailor the data as you want it[md]the most attractive feature of XML is that you can create your own tags. This lets you structure the data in an XML document as you like. Let's take a look at an XML example to make this clearer.
-An XML Example
In this example, grocery.xml, we'll record which grocery customers purchased what groceries and when, setting up an XML document to record that data. Let's see how this works now.
-We start .+ with a processing instruction indicating that this document is an XML document and uses XML version 1.0:
<--
.
.
.
Processing instructions start with .+, and they are directives to the XML processor[md]the application that reads in and interprets our XML. In this case, we're indicating the XML version to the XML processor.
Next, we can start inventing .+'s XML tags. We start with a tag ("DOCUMENT" is our own choice for a tag[md]there is no predefined XML DOCUMENT tag) to show we're starting our XML document:
<--
.
.
.
This tag starts our new XML document. We can structure the XML document grocery.txt customer by customer, listing the groceries each customer has purchased between and tags:
<--
.
.
.
<--
Next, let's store this customer's name. We can nest tags in XML to arbitrary depth; in this example, that means that we can set up a name section for each customer by creating and using a tag this way:
<--
.
.
.
<--
We can further store the customer's first and last names with tags we can call and :
Edwards <--
Britta <--
.
.
.
Now we can store the date on which the customer purchased the groceries by setting up a tag:
Edwards
Britta
April 17, 1998 <--
.
.
.
At this point, then, we're ready to store the grocery orders the customer made. We can do that with a new tag we create, the tag:
Edwards
Britta
April 17, 1998
<--
.
.
.
<--
We can store each item the customer ordered with an - tag:
Edwards
Britta
April 17, 1998
- <--
.
.
.
<--
Let's say this customer ordered five cucumbers; we can store that data and the price paid with three new tags: , , and :
Edwards
Britta
April 17, 1998
-
Cucumber <--
5 <--
$1.25 <--
Let's also say that besides cucumbers, this customer also ordered lettuce. We can store that data as a new item this way:
Edwards
Britta
April 17, 1998
-
Cucumber
5
$1.25
- <--
Lettuce <--
2 <--
$.98 <--
<--
We can even store data for additional customers in the same XML document this way:
Edwards
Britta
April 17, 1998
-
Cucumber
5
$1.25
-
Lettuce
2
$.98
<--
<--
Thompson <--
Phoebe <--
<--
May 27, 1998 <--
<--
- <--
Banana <--
12 <--
$2.95 <--
<--
- <--
Apple <--
6 <--
$1.50 <--
<--
<--
<--
In this way, we've stored[md]and structured[md]the data we need to keep track of the customers by creating and using our own XML tags. We are free of the HTML restrictions that make us use a set of predefined tags.
However, as you can tell, XML is a free-form language. It will be up to us to make sense of the tags we're using, not some prewritten browser application. We'll see how to interpret XML throughout this book.
In XML, we make up our own tags to describe and structure the data we want to store and use. In larger documents, this can lead to a lot of confusion: how can we be sure we've set up our XML document correctly? That is, what if we omitted the end tag above? Our XML processor might mistake that to mean that we only want to store data from one customer, not two. What if we omit the person's last name and the corresponding tags: and by mistake? What if we place tags in the wrong place? How will we know?
Document Type Declarations
It turns out that you can define what tags you will use in the XML document, what order they should go in, what tags other tags can contain, and set up other rules in the document's Document Type Declaration, or DTD. The DTD is not strictly necessary in many XML documents, but to make sure your document is valid (that's the XML terminology), you should include a DTD so that the XML processor can check to make sure the XML document obeys the rules you've set up. We'll see a lot more about DTDs in the next chapters, but here is a DTD for the document we've created, specifying which tags can contain what other tags and in what order, as well as what type of data tags can contain (PCDATA stands for parsed character data, or text, and the * symbol means the item it refers to may be repeated):
<--
<--
<--
<--
<--
<--
<--
<--
<--
<--
<--
]> <--
Edwards
Britta
April 17, 1998
-
Cucumber
5
$1.25
-
Lettuce
2
$.98
Thompson
Phoebe
May 27, 1998
-
Banana
12
$2.95
-
Apple
6
$1.50
-Now we've seen an XML document. Note that XML is so free-form that it's up to us to interpret it. Can you imagine having one of the major Internet browsers like the Internet Explorer try to read in the above document? That browser would have no idea what to make of tags like or - . What should it do with them? Should it display that data in a particular format? Should it store the data in a file? It's up to us to interpret the document ourselves (although there are parts of XML that are predefined, as we'll see, and which the major browsers do plan to support).
Parsing and Browsing XML
To interpret XML documents, we'll parse them (that is, dissect them into their logical structure). There are a number of XML parsers available that we can work with. Parsing an XML document breaks it up into its component elements. Then, however, it will be up to us to make sense of the result, because only we know what we want to do with the data in the or tags.
--Almost all XML parsers are written in Java these days, which means we'll use a lot of Java in this book. XML parsers come as prewritten Java code that we can place right into our applications. We'll make use of that code in our programs to read in XML documents and determine their structure. In the rest of this chapter, then, we'll come up to speed in Java, getting familiar with[md]or, if you already know Java, reviewing[md]the code we'll need to make XML parsers work. Then we'll be able to write programs in Java that load, parse, and process (i.e., display, interpret, analyze as indicated by the document's content) XML documents.
-Note: If you're familiar with Java and the topics in this chapter, feel free to skip on to Chapter 2, "Working with XML."
-Let's start now with Java, working through the Java we'll need for the rest of the book in this chapter. In the next chapter, we'll start parsing and making use of actual XML documents. Here, we'll use Java 1.02, the version that is the most widely used today (in practice, you can use any Java version, of course, but here we'll use Version 1.02). To make use of the code in this book (and virtually every XML parser), then, make sure you have Java installed on your computer. You can download the Java Development Kit (JDK) from the Java Web site, http://www.javasoft.com; to install Java, follow the instructions that come with the JDK.
-Our First Java Application:
helloapp
--Our first Java application will be a very simply one, only printing the greeting: "Hello." This will get us started in Java, creating the type of Java applications we'll use throughout the book, so let's take a look at how this works now. In the next chapter, we'll see how to connect XML parsers to our Java applications.
-Learning Java is fundamental to working with XML today, because almost all XML software uses Java, and you need to use Java to use that software. XML is designed for the Web, and Java is the most popular programming language used on the Web. We'll be using Java in all the chapters in this book (except Chapter 9 on XML stylesheets).
-This first Java application will only print the greeting "Hello.", so we'll call it helloapp. Using a text editor or word processor (if you use a word processor, make sure you store Java programs in plain text format, without any special formatting characters), create the file .+. Note that a Java file's extension must be .java; if your editor can't create documents with that extension, save the file with the extension .jav and rename it later, giving it the extension .java.
To start, place this text in helloapp.java:
public class helloapp {
.
.
.
}
Here we declare the new Java class .+. And that brings us to the first question: what's a class? To answer that question, we will quickly overview object-oriented programming.
-Object-Oriented Programming: Classes and Objects
OOP (Object-oriented programming) was originally invented to deal with larger programs, and it provided the programmer with a way of encapsulating code and data into easily conceptualized bundles called objects. This cleared the program's general workspace of many unrelated data items and functions.
--What Are Objects?
To understand how objects work, consider a refrigerator as an example. A refrigerator has many internal parts[md]regulators, a compressor, thermostats, and so on. Imagine how difficult it would be to use a refrigerator if you had to deal with all the parts of a refrigerator yourself[md]turning the compressor on and off, regulating the temperature, and so forth. What makes a refrigerator useful is that it performs all these operations itself, internally. What's left is an object with one easy-to-remember purpose: to refrigerate foods. You don't have to concern yourself with the internal operation of the refrigerator at all.
-OOP works in much the same way; if you have functions, subroutines, and data that manage screen displays, you can put everything[md]functions, subroutines, and data[md]together into one object named, say, display. Then everything having to do with the display is in this object. Many functions will remain internal to this object, called by other internal functions, and because they're internal, they no longer clutter up the rest of the program. Internal data and functions are called private data and functions. Data and functions accessible from outside the object are called public data and functions. For example, if the display object had a public function named .+ function is a public function of the display object.
-That gives us an overview of objects. Now what are classes?
--What Are Classes?
Classes are to objects as cookie cutters are to cookies. That is, a class is an object's type, much as an integer named .+ this way in Java:
dataClass dataObject; <--
.
.
.
Note: Like C or C++, you end lines of code in Java with a semicolon, ;.
To actually create that new object, dataObject, we use the Java new operator this way (we'll see more about this process later in this chapter):
dataClass dataObject;
dataObject = new dataClass();
.
.
.
Now that the dataObject object has been created, we are free to use the getData() function in that object like this:
dataClass dataObject;
dataObject = new dataClass();
dataObject.getData();
.
.
.
An object's built-in functions are called its member methods, just as its built-in data (stored in variables, data structures, or other, internal, objects) is referred to as member data. You reach member data just as you reach member methods: if dataObject has a public data member named, say, dataInteger, we can reach that in Java this way:
dataClass dataObject;
dataObject = new dataClass();
dataObject.getData();
dataObject.dataInteger = 0;
That completes our overview of classes and objects. The best way to learn about these programming concepts is to see them at work, so now that we have some of the terminology down, let's return to our Java example, .+.
Java Class Files
This is what our file helloapp.java looks like so far:
public class helloapp {
.
.
.
}
Here we're declaring a class named helloapp and making it public. When we make this class public, Java will create a class file named .+, and it is this class file that we'll actually run to see our program work. When we run it, Java will create an object of this class and give it control, as we'll see.
-Now we've declared the new class we need, but how do we make it do anything? It turns out that there is a special method in Java applications named the .+ method, and when you run a Java application, Java runs this method first. The next step, then, is to add that method to our program.
The main() Method
You declare the .+ method does not return a value, so we set its return type to void:
public class helloapp {
public static void main(String args[]) {
.
.
.
}
}
Note that we enclose the body of the class definition and the .+ method as static in Java applications.
Now we're ready to complete the program and write the actual code to display the text "Hello.". We do that with these lines:
public class helloapp {
public static void main(String args[]) {
System.out.println("Hello.");
}
}
The .+ stands for print line) to print the line "Hello.". This line goes to the Java output console. In Windows 95, that means a DOS window. (We'll see all about windowed output in the next example.)
Our program is ready to run, and we'll see how to run it now.
Creating
helloapp.class
The first step in running our helloapp program is to create the bytecode file, .+. Java bytecodes are special codes that the Java interpreter, called the Java Virtual Machine (JVM), reads and runs.
To create the .+, then, we pass it to javac like this:
C:\>javac helloapp.java
This compiles .+. Now we're ready to run the example; we can run this file and (finally) see our greeting.
Running
helloapp.class
To run .+, which comes with the JDK. All we need to do is to type this line in the DOS session:
C:\>java helloapp
When the program runs, it creates the greeting "Hello." and displays it:
C:\>java helloapp
Hello.
-And that's it[md]our first Java program is a success. However, it's a pretty modest program, and if that was the extent of our Java expertise, this would be a very short book. Let's start expanding our Java skills now by seeing how to work with and display a window.
-Programming Java Applets
In our next example, let's display the string .+ in a Web browser:
-This will show us how to set up Java applets (our previous program was a Java application). Java applets are targeted at the Web and usually displayed in Web browsers. Applets will be useful for us in this book because they have a great deal of graphics capability built-in, which will let us create and run XML browsers.
-However, there is one consideration: because of security restrictions, Java applets displayed in Web browsers don't support file handling, which means they can't work with XML documents. To fix this problem, we'll set up our programs in this book as applications that display a window, and in that window, we'll display an applet to produce our graphics output. This is the standard way of working with graphical output in stand-alone (i.e., no browser involved) Java applications, so we will spend a little time seeing how applets work before proceeding. After we learn how to create applets, we'll see how to display them in their own window[md]that is, without using a Web browser[md]so we can use file handling and therefore read in and write out XML documents.
Let's name this new example applet, say, .+:
public class appl extends java.applet.Applet <--
{
}
When we compile this applet, the javac compiler will produce the file .+.
-Note the keywords extends .+ refers to the Applet class of the Java package named applet (a Java package is a library of prewritten classes, ready for us to use).
-By extending the Applet class, we are inheriting all the functionality of that class. Inheritance is one of the most important and essential characteristics of OOP, and it allows us to build classes on top of class. Using inheritance, we can use the prewritten Java classes to provide our own classes with a foundation full of resources that we can use. We'll see more about inheritance in this chapter when we create our own stand-alone window from the .+ class. (Frame refers to a window with a border you can use for resizing.)
-Now we're ready to display our text string .+ as graphics in the applet.
-Note: You might be surprised to see text like .+ referred to as graphics. However, even text is just another graphical element in a windowing environment that uses a GUI.
-Displaying Graphics in an Applet
We usually display graphics in the .+ method to our applet like this:
public class appl extends java.applet.Applet
{
public void paint( Graphics g )
{
.
.
.
}
}
The .+ method. Overriding methods like this is another important part of OOP.
Note that we are passing an object of the .+. We will use this object to create our graphics display in the applet.
The .+ class, we import that class into our program like this:
import java.awt.Graphics;
public class appl extends java.applet.Applet
{
public void paint( Graphics g )
{
.
.
.
}
}
We're ready to display our greeting as soon as we determine where in the applet's window we want to display it. The applet's coordinate system starts at the upper left; x increases to the right, and y increases downwards like this:
Measurements in Java are in pixels, so we can display our string starting at, say, (60, 30) using the .+ method:
import java.awt.Graphics;
public class appl extends java.applet.Applet
{
public void paint( Graphics g )
{
g.drawString( "Welcome to XML!", 60, 30 ); <--
}
}
Note: The origin of displayed text strings, like (60, 30) in our example, refers to the lower-left corner of the text string as it appears in the applet's window.
-And that's it for our Java code. Use .+ now.
-We're almost ready[md]the final step is to display our applet in a Web browser, and we'll need a Web page for that.
-Creating a Web Page for Our Applet
We'll write the applet's Web page in HTML, starting with a heading and title like this:
HELLO
.
.
.
Now we embed our applet, .+, in the Web page with the HTML