12 XML DOM and JAVA

Dr M. Vijayalakshmi

XML DOM Parsers

XML parsers accesses and manipulates an XML DOM tree. Java provides W3C DOM specification as a separate package called org.w3c.dom. In this package, Java provides interfaces and objects together with methods and properties according to the DOM specification. These specification is implemented in java which is used to navigate and manipulate the DOM tree.

DOM Parsers

Greeting.xml

<?xml version=”1.0″ ?>

<greeting> Hello World! </greeting>

Element Node

Text node

Figure 12.1 shows that how the xml file ‘Greeting.xml’ could be parsed into Element node and Text node. The node <greeting> is identified as Element node and the text ‘Hello World’ is identified as Text node.

Creating Document

In order to navigate and manipulate an XML file, we need to access the DOM tree for an XML document.The steps for creating document in order to parse the given sample XML document is given below.

Step 1: First XML related packages has to be imported.

The Packages to be imported are,

import javax. xml.parsers.*;

import org.w3c.dom.*;

Step 2: Create an instance of the parser. A parser is created using DocumentBuilder.

DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance(); DocumentBuilder parser=factory.newDocumentBuilder();

Step 3: To parse an XML document and to create a Document object, the following overloaded methods are used,

Document parse (InputStream in)

Document parse (InputStream in, String base)

Document parse (String uri)

Document parse (File xmlFile)

The following overloaded parse method is used to parse and create a Document object from XML files. Document doc = parser.parse(“greeting.xml”);

Once the Document object is created for the given XML file and then the root element can be extracted. Then the required operations can be performed in the XML document by calling the corresponding functions.

Navigating DOM Tree

A node in the DOM may be referenced as, start from the root node and use structural relationships to reach other nodes. To navigate the DOM tree, we use the following methods.

One method is that we can use the getElementById() method of the Document object to access a particular node. The other methods is that using the getElementsByTagName() method on the Document object to access all element nodes with a common tag name specified.

Using root node

W3C defined a property documentElement() on the Document object that refers to the root element of the XML document.

The method getDocumentElement() is used on the Document object to retrieve the root node which is an Element node.

Element root = doc.getDocumentElement();

The following snippet code is for retrieving the name of the root node and printing it. The method getNodeName() is used for retrieving the name of a node.

String name = root.getNodeName(); System.out.println (name);

//prints “greeting”

The following snippet code is used for retrieving the text content of the element. The method getFirstChild() is to retrieve the first child of the root node that retrieves the text node from the XML file. Once the node is retrieved, we can print the value of the node using the method getNodevalue().

Text node= (Text) root.getFirstChild();

String txt = node.getNodeValue();

System.out.println(txt);

// prints “Hello World”

Getting all child nodes

The XML document “questions.xml” is defined below. The XML document describes about question papers each with question id and text content of the question.

<?xml version = “1.0”?>

<question-paper>

What is DOM?

</question>

What are Leaves?

</question>

</question-paper>

Figure 12.3 shows how this XML file can be viewed as DOM tree. The root node is <question-paper>. The <question> node becomes the child of the root node. The <question> node has an attribute ‘id’ and child node which is the text node.

Example 1 – Getting all child nodes

This program code explains about how to get all the child nodes and print the value of all the nodes. First we create a Document object using the parse method on the DocumentBuilder object. Use the method getDocumentElement() method to retrieve the root node. Once the root node is retrieved, we can navigate the tree. To retrieve all the child nodes use the method getChildNodes() on the root which returns a NodeList. NodeList is an array, so we can use the method getLength() to find the number of children of the root. Use a for loop to read every child with the method item() on the NodeList object. Once we retrieve very node, use the method getNodeType() to retrieve the type of the node. If the type of the node is an ELEMENT_NODE then find the first child of the Element node using the method getFirstChild() and print the value of the node which is the Text node using the method getNodeValue().

import javax.xml.parsers.*;

import org.w3c.dom.*;

class GetQuestion

{

public static void main(String args[])

{

try

{

DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();

DocumentBuilder parser=factory.newDocumentBuilder();

Document doc =parser.parse(“questions.xml”);

Element root=doc.getDocumentElement();

NodeList children=root.getChildNodes(); //get all children of root System.out.println(children.getLength());

//Getting all child nodes

for(int i=0;i<children.getLength();i++)

{

Node node=children.item(i);

if(node.getNodeType()==Node.ELEMENT_NODE)

System.out.println(node.getFirstChild().getNodeValue());

}

}catch(Exception e)

{

e.printStackTrace();

}}

}

Figure 12.4 shows the output of getting all child nodes from the root. The root node <question-paper> is retrieved and the children of the root is the list of <question> nodes. Every <question> node is retrieved whose child is the text node and whose value is printed as output of this program code.

Using getElementsByTagName

A list of elements having a tag name can be obtained using the method getElementsByTagName().In the given XML file the <question> elements may be obtained by specifying the name of the element passed as parameter to the method getElementsByTagName().

NodeList children=doc.getElementsByTagName(“question”); This method can also be invoked on the root element.

NodeList children =root.getElementsByTagName(“question”);

Then the list of Question elements can be obtained using the similar procedure as mentioned in the previous example.

for(int i=0; i<children.getLength(); i++)

{

Node node=children.item(i);

System.out.println(node.getFirstChild().getNodeValue());

}

Using getElementsById

The method getElementsById() on the Document object is used to get an element with a specified id.

<?xml version=”1.0″?>

<!DOCTYPE question-paper [

<!ELEMENT question-paper (question+)>

<!ELEMENT question (#PCDATA)>

<!ATTLIST question id ID #REQUIRED>

<question-paper>

What is DOM?

</question>

What are leaves?

</question>

</question-paper>

To retrieve the element by ‘ID’ we use the method getElementById(“q1”) passing the value of the id. Element e = doc.getElementById(“q1”);

Getting attributes of an element

All Attributes of a context element are obtained using,

NamedNodeMap getAttributes()

The value of a specific attribute can be obtained using the following method on the element node. String getAttribute(String attributeName);

The method is defined on the Element object. Typecast the Node object to the Element object first and then apply the getAttribute() method

String value=((Element) Node).getAttribute(“no”);

If the node is really an Element type node, can be invoked directly without typecasting. String value=e.getAttribute(“no”);

Example 2 – Getting Elements by ID.

As explained earlier, first we create a Document object using the parse method on the DocumentBuilder object. Use the method getElementById(“q1”) by passing the ID as ‘q1’ to retrieve the Element node with ID ‘q1’. Once the Element node is retrieved, use the method getAttribute() to pass the attribute name of the node and retrieve the attribute’s value. Then print the value which prints the Question no. as ‘q1’. Then get the first child of this Element node which is the Text node and print the node value as ‘ What is DOM?’.

import javax.xml.parsers.*;

public class GetElementById {

public static void main(String args[])

{

try

{

DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();

DocumentBuilder parser=factory.newDocumentBuilder();

Document doc=parser.parse(“question1.xml”);

Element e=doc.getElementById(“q1”);

String value=e.getAttribute( “id” );

System.out.print( value + “.” ); // prints the question no

System.out.println( e.getFirstChild().getNodeValue() );

}catch( Exception e )

{

e.printStackTrace();

}

The following Figure.12.5 shows the output of the above example code which retrieves the Element using the method getElementById().

Example 3 – Getting Elements by TagName

As explained earlier, first we create a Document object using the parse method on the DocumentBuilder object. We can retrieve the list of elements having a tag name ‘question’ using the method getElementsByTagName(). Then we can use a for loop to navigate each element and print the attribute value by passing the attribute name to the getAttribute() method. We then print the Node value of each element.

import javax.xml.parsers.*;

import org.w3c.dom.*;

public class GetAttribute {

public static void main(String args[])

{

try{

DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();

DocumentBuilder parser=factory.newDocumentBuilder();

Document doc=parser.parse(“questions.xml”);

Element root=doc.getDocumentElement();

NodeList children =root.getElementsByTagName(“question”); for(int i=0;i<children.getLength();i++)

{

Node node=children.item(i);

String value=((Element)node).getAttribute(“id”);

System.out.print(value+”.”);

System.out.println(node.getFirstChild().getNodeValue());

}

}catch(Exception e){ e.printStackTrace(); } } }

Viewing DOM

A DOM tree may be transformed back to an XML document which can be displayed on the screen and stored in a file. This helps us to visualize and verify the DOM tree after adding or deleting nodes.This can be done by creating Transformer object and a DOMSource object. The task of the Transformer object is to transform the specified XML document to the specified stream.We can use the method transform(source, result) on the Transformer object to transform the DOM source to output stream.An example to display the result after transformation to the standard output is given below,

Example 4: Viewing DOM

import javax.xml.transform.*;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;

import javax.xml.parsers.*;

import org.w3c.dom.*;

import javax.xml.transform.*;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;

public class view {

{

try {

DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance(); DocumentBuilder parser=factory.newDocumentBuilder();

Document doc=parser.parse(“questions.xml”);

Element root=doc.getDocumentElement();

TransformerFactory tFactory = TransformerFactory.newInstance();

Transformer transformer = tFactory.newTransformer();

DOMSource source = new DOMSource(root); StreamResult result=new StreamResult(System.out); transformer.transform(source, result); }

catch(Exception e)

{

e.printStackTrace();

} }

}

The Figure 12.7 shows the output of viewing the XML file.

Manipulating DOM Tree

With the XML DOM, Java can access and change all the elements of an XML document. A DOM tree can be manipulated by adding or deleting node. we can also set attributes to the element node.

Creating a node

A Text type node is created using the createTextNode() method on the Document node. Text createTextNode(String text)

Once created, to attach to an Element node, We need to create an Element node.

The method signature is

Element createElement(String elementname)

Element e = doc.createElement(“question”);

Setting an attribute

The attribute of an element is set using the setAttribute() method of the element node. The method signature is

setAttribute(String attributename, String attributevalue); e.setAttribute(“id”, “q3”);

Adding a node

The Text node can now be attached to the Element node e using the appendChild() method.

.appendChild(txt);

The element node e is attached to the root element

root.appendChild(e);

Example 5: Creating Nodes and Adding Nodes

As explained earlier, first we create a Document object using the parse method on the Document Builder object. Then we create a Transformer object and DOMSource object to view the XML file before adding elements. We then create a Text node using the method, createTextNode( “What is DTD?” ) by passing the text value. Then we can create the create Element node using the method createElement(“question” ) by passing the name of the element ‘question’. Now we can append this created Text node to the Element node using the method appendChild( txt ). Then set the attribute for the Element node with attribute name as ‘id’ and value as q3′ using the method setAttribute( “id”, “q3” ). Now append the created Element node to the root node of the XML DOM using root.appendChild(e).

import java.io.*;

import javax.xml.parsers.*;

import org.w3c.dom.*;

import javax.xml.transform.*;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;

public class AppendQuestion {

public static void main(String args[])

{

try{

DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();

DocumentBuilder parser=factory.newDocumentBuilder();

Document doc=parser.parse( “questions.xml” );

Element root=doc.getDocumentElement();

TransformerFactory tFactory=TransformerFactory.newInstance();

Transformer transformer=tFactory.newTransformer();

DOMSource source=new DOMSource( root ); StreamResult result=new StreamResult( System.out ); System.out.println( “Before addition” ); transformer.transform(source , result);

Text txt=doc.createTextNode( “What is DTD?” );

Element e=doc.createElement( “question” );

e.appendChild( txt );

e.setAttribute( “id”, “q3” );

root.appendChild(e);

System.out.println( “\n After Addition” );

transformer.transform(source,result);

FileOutputStream fout=new FileOutputStream( new File( “out.xml” ) ); StreamResult result1=new StreamResult( fout ); transformer.transform( source, result1 ); }catch( Exception e )

{

e.printStackTrace();

} }

}

The output of the above example code is shown in Figure 12.8.

Creating Clone

A copy of a node can be obtained using the method,

Node aCopy=questions.item(0).cloneNode(true);

Then set the attribute to the copy node, ((Element)aCopy).setAttribute(“id”,”q3″);

Replace the Text value of the Text node using,

((Text)aCopy.getFirstChild()).replaceWholeText(“What is XML?”);

Append this clone node to the root of the XML DOM.

root.appendChild(aCopy);

Example 6: Creating Clone Node and Appending

import javax.xml.parsers.*;

import org.w3c.dom.*;

import javax.xml.transform.*;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;

public class CopyQuestion {

public static void main(String args[])

{

try

{

DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();

DocumentBuilder parser=factory.newDocumentBuilder();

Document doc=parser.parse(“questions.xml”);

Element root=doc.getDocumentElement();

TransformerFactory tFactory=TransformerFactory.newInstance();

Transformer transformer=tFactory.newTransformer();

DOMSource source=new DOMSource(root); StreamResult result=new StreamResult(System.out); System.out.println(“Before addition”); transformer.transform(source, result);

NodeList questions=doc.getElementsByTagName(“question”); Node aCopy=questions.item(0).cloneNode(true); ((Element)aCopy).setAttribute(“id”,”q3″); ((Text)aCopy.getFirstChild()).replaceWholeText(“What is XML?”); root.appendChild(aCopy);

System.out.println(“\n After Addition”);

transformer.transform(source,result);

}catch(Exception e)

{

e.printStackTrace();

}

The output of the above example code is shown in Figure 12.9.

Summary

This module discusses about how to navigate and manipulate XML DOM using Java. The module also explores with examples how to view DOM, manipulate DOM by creating nodes, cloning nodes and adding to the DOM tree.

Web Links

www.w3schools.com/xml
www.cis.upenn.edu/~matuszek/cit597-2003/Lectures/08-dtd.ppt
Uttam Kumar Roy, “ Web Technologies”, Oxford University Press, 2010.