h1

Playing with Java and XML

Java 1.5 has the Xerces XML library already included in the standard installation.  This example will be using this included Xerces library.

This document assumes you already know basic XML file structure and meaning.

Here is some XML.  It is not created consistently so that when we start playing with it using JAXP you can see how the structure of the text file and placement of the tags can effect how it is parsed and used.

The XML consists of a library of three books.  The first book is defined entirely on one line.  The second has two name tags for the single book tag, and the third looks more like a “standard” declaration for a book.

The data.xml file

<?xml version=“1.0″ encoding=“UTF-8″?>

<library>

<book><name>Moby Dick</name></book>

<book>

<name>The Book of Mormon</name>

<name>Another Testiment of Christ</name>

</book>

<book>

<name>Grapes of Wrath</name>

</book>

</library>

These two lines of code generates a DOM document for you from the data.xml file.

File dataFile = new File(“data.xml”);

Document domDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(dataFile);

A DOM document is the root from which all the rest of your processing will flow.  From it you can get a list, of type NodeList, that contains Node objects that represent all of the books in the document.

NodeList books = domDocument.getElementsByTagName(“book”);

With this list it is possible to iterate and retrieve nodes representing each book.  In order to understand the effect of the different ways book tags were placed in the XML file let’s print out how many children each book has and what the children’s types are.   In the outer for loop we retrieve each book from the books list.  In the inner for loop each of the book’s children are retrieved and their types are printed out.

for(int i = 0; i < books.getLength(); i++) {

Element book = (Element)books.item(i);

NodeList bookChildren = book.getChildNodes();

System.out.println(“num children of a book: “+bookChildren.getLength());

for(int j = 0; j < bookChildren.getLength(); j++){

Node aChild = bookChildren.item(j);

System.out.println(“\t”+aChild.getNodeName());

}

}

The result of running this code is seen below.  Notice in the first book that there is only one child node.  This is because there is only one tag, name, in the book and the entire book is declared on one line in the XML file.

Even though the second book declares only two name tags the book itself has five child nodes.  The child nodes of type #text represent the newline characters and other whitespace characters such as spaces and tabs before, between, and after the declared name nodes.  The third book shows these same #textnodes with one before and one after the declared name node.

This is required of the standard since a tag can have sub-tags as well as text within it.

num children of a book: 1
name

num children of a book: 5
#text
name
#text
name
#text

num children of a book: 3
#text
name
#text

Because of the inclusion of nodes representing the text before and after a child node declaration care must be taken so that you get only the nodes you want.  Since in this example we only want the name tags we’ll request them all from the book and then iterate over them like we did when printing out their types.

We use the getElementsByTagName method of the book element to get all of it’s children of type name.  This bypasses any #text type nodes.

for(int i = 0; i < books.getLength(); i++) {

Element book = (Element)books.item(i);

NodeList bookChildren = book.getChildNodes();

System.out.println(“Book “+(i+1));

NodeList names = book.getElementsByTagName(“name”);

for(int k = 0; k < names.getLength(); k++){

Element name = (Element)names.item(k);

System.out.println(“\tname”+(k+1)+“: “+name.getFirstChild().getNodeValue());

}

}

Once we have a name element we can print out it’s content.  In order to do this you must remember that each time text is placed inside a tag that conforming parsers are required to place any text in a #textnode.  Because of this we have to get the first child of the name element.  This allows us to print out the value of the #text node using it’s getNodeValue method.

Here is the output of running this portion of the code.

Book 1

name1: Moby Dick

Book 2

name1: The Book of Mormon

name2: Another Testiment of Christ

Book 3

name1: Grapes of Wrath


Here is the code for the entire example.  Comment or uncomment the two code segments to see the two different types of results.

package edu.byui.examples;

import java.io.File;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilderFactory;

import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.DOMException;

import org.w3c.dom.Document;

import org.w3c.dom.Element;

import org.w3c.dom.NodeList;

import org.xml.sax.SAXException;

public class DataParser {

public static void main(String[] args) {

File dataFile = new File(“data.xml”);

try {

//parse using builder to get DOM representation of the XML file

Document domDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(dataFile);

NodeList books = domDocument.getElementsByTagName(“book”);

for(int i = 0; i < books.getLength(); i++) {

Element book = (Element)books.item(i);

NodeList bookChildren = book.getChildNodes();

//un-comment this section to list out the node types

/*System.out.println(“num children of a book: “+bookChildren.getLength());

for(int j = 0; j < bookChildren.getLength(); j++){

Node aChild = bookChildren.item(j);

System.out.println(aChild.getNodeName());

}*/

//un-comment this section to see the node values

System.out.println(“Book “+(i+1));

NodeList names = book.getElementsByTagName(“name”);

for(int k = 0; k < names.getLength(); k++){

Element name = (Element)names.item(k);

System.out.println(“\tname”+(k+1)+“: “+name.getFirstChild().getNodeValue());

}

}

catch (DOMException e) {

// TODO Auto-generated catch block

e.printStackTrace();

catch (ParserConfigurationException e) {

// TODO Auto-generated catch block

e.printStackTrace();

catch (SAXException e) {

// TODO Auto-generated catch block

e.printStackTrace();

catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

}

}


One comment

  1. Brother Barney,

    Thanks so much for these examples. I’m currently an intern at USAA for the summer 09. They have given me a project that requires me to create web applications using Java under the framework Wicket. I’m required to export xml documents for graph processing of throttle data between servers. I can still remember learning this in class. Thanks for your dilligence :)



Leave a Comment