XML Parsing with JAVA with Example Java Programs and Examples with Output

Writing XML documents is very straightforward, but reading them is not nearly as

simple. Fortunately, we can use an XML parser to read the document for us. The

XML parser exposes the contents of an XML document through an API. A client

application reads an XML document through this API. As well as reading the

document and providing the contents to the client application, the parser also checks

the document for well-formedness and (optionally) validity. If it finds an error, it

informs the client application.

XML parser is a software module to read documents and a means to provide access to

their content. XML parser generates a structured tree to return the results to the

browser. An XML parser is similar to a processor that determines the structure and

properties of the data. An XML parser can read a XML document to create an output

to generate a display form. Now, XML parser for Java runs on any platform where

there is Java virtual machine. It is sometimes called XML4J. It has an interface which

allows you to take a string of XML formatted text, pick the XML tags and use them to

extract the tagged information.

Among the various XML parsers, the two mostly used ones are SAX parser & DOM

parser. Here is a brief description of these two different parsers.

SAX

SAX, the Simple API for XML, is the gold standard of XML APIs. It is the most

complete and correct by far. Given a fully validating parser that supports all its

optional features, there is very little you can’t do with it. It has one or two holes, but

they're really off in the weeds of the XML specifications, and you have to look pretty

hard to find them. SAX is an event driven API. The SAX classes and interfaces model

the parser, the stream from which the document is read, and the client application

receiving data from the parser. However, no class models the XML document itself.

Instead the parser feeds content to the client application through a callback interface,

much like the ones used in Swing and the AWT. This makes SAX very fast and very

memory efficient (since it doesn’t have to store the entire document in memory).

However, SAX programs can be harder to design and code because you normally

need to develop your own data structures to hold the content from the document.

SAX works best when your processing is fairly local; that is, when all the information

you need to use is close together in the document. For example, you might process

one element at a time. Applications that require access to the entire document at once

in order to take useful action would be better served by one of the tree-based APIs

like DOM or JDOM. Finally, because SAX is so efficient, it’s the only real choice for

truly huge XML documents. Of course, “truly huge” has to be defined relative to

available memory. However, if the documents you're processing are in the gigabyte

range, you really have no choice but to use SAX.

DOM

DOM, the Document Object Model, is a fairly complex API that models an XML

document as a tree. Unlike SAX, DOM is a read-write API. It can both parse existing

XML documents and create new ones. Each XML document is represented as

Document object. Documents are searched, queried, and updated by invoking methods

on this Document object and the objects it contains. This makes DOM much more

convenient when random access to widely separated parts of the original document is

required. However, it is quite memory intensive compared to SAX, and not nearly as

well suited to streaming applications.

Ahead in the document I have included example of XML parsing with Java using both

of these parsers.

XML Parsing with JAVA

I would like to start with an example of how to parse a XML file create Java Objects

and manipulate them.

The idea here is to parse the employees.xml file with content as below

<?xml version="1.0" encoding="UTF-8"?>

<Name>Debamalya</Name>

</Employee>

<Name>Rishin</Name>

</Employee>

<Name>Debalina</Name>

</Employee>

</Office>

From the parsed content create a list of Employee objects and print it to the console.

The output would be something like

Employee Details - Name:Debamalya, Type:permanent, Id:235960, Age:25.

Employee Details - Name:Rishin, Type:contract, Id:3675, Age:24.

Employee Details - Name:Debalina, Type:permanent, Id:3676, Age:28.

I will start with a DOM parser to parse the xml file, create Employee value objects

and add them to a list. To ensure we parsed the file correctly let's iterate through the

list and print the employees data to the console. Later we will see how to implement

the same using SAX parser.

In a real world situation you might get an xml file from a third party vendor which

you need to parse and update your database.

Using DOM Parser:

This program DomParserExample.java uses DOM API.

The steps are

• Get a document builder using document builder factory and parse the xml file

to create a DOM object.

• Get a list of employee elements from the DOM.

• For each employee element get the id, name, age and type. Create an

employee value object and add it to the list.

• At the end iterate through the list and print the employees to verify we parsed

it right.

a) Getting a document builder

private void parseXmlFile(){ 
 DocumentBuilderFactory dbf = 
   DocumentBuilderFactory.newInstance(); 

 try { 
  //Using factory get an instance of document builder 
  DocumentBuilder db = dbf.newDocumentBuilder(); 
  //parse using builder to get DOM representation of the XML 
  file 
  dom = db.parse("employees.xml"); 

 }catch(ParserConfigurationException pce) { 
  pce.printStackTrace(); 
 }catch(SAXException se) { 
  se.printStackTrace(); 
 }catch(IOException ioe) { 
  ioe.printStackTrace(); 
 } 
}

b) Get a list of employee elements

Get the rootElement from the DOM object. From the root element get all employee

elements. Iterate through each employee element to load the data.

private void parseDocument(){ 
 //get the root element 
 Element docEle = dom.getDocumentElement(); 

 //get a nodelist of elements 
 NodeList nl = 
   docEle.getElementsByTagName("Employee"); 
 if(nl != null && nl.getLength() > 0) { 
  for(int i = 0 ; i < nl.getLength();i++) { 

   //get the employee element 
   Element el = (Element)nl.item(i); 

   //get the Employee object 
   Employee e = getEmployee(el); 

   //add it to list 
   myEmpls.add(e); 
  } 
 } 
}

c) Reading in data from each employee.

/** 
 * I take an employee element and read the values in, create 
 * an Employee object and return it 
 */ 
private Employee getEmployee(Element empEl) { 

 //for each  element get text or int values of //name ,id, age and name 
 String name = getTextValue(empEl,"Name"); 
 int id = getIntValue(empEl,"Id"); 
 int age = getIntValue(empEl,"Age"); 

 String type = empEl.getAttribute("type"); 

 //Create a new Employee with the value read from the xml nodes 
 Employee e = new Employee(name,id,age,type); 

 return e; 
} 

/** 
 * I take a xml element and the tag name, look for the tag 
and  
 * get the text content 
 * i.e for Deb xml snippet 
if 
 * the Element points to employee node and tagName is  
 *'name' I will return Deb 
 */ 
private String getTextValue(Element ele, String tagName) { 
 String textVal = null; 
 NodeList nl = ele.getElementsByTagName(tagName); 
 if(nl != null && nl.getLength() > 0) { 
  Element el = (Element)nl.item(0); 
  textVal = el.getFirstChild().getNodeValue(); 
 } 

 return textVal; 
} 

/** 
 * Calls getTextValue and returns a int value 
 */ 
private int getIntValue(Element ele, String tagName) { 
 //in production application you would catch the exception 
 return Integer.parseInt(getTextValue(ele,tagName)); 
}

d) Iterating and printing.

private void printData(){ 
 System.out.println("No of Employees '" + myEmpls.size() + 
   "'."); 
 Iterator it = myEmpls.iterator(); 
 while(it.hasNext()) { 
  System.out.println(it.next().toString()); 
 } 
}

Using SAX Parser:

This program SAXParserExample.java parses a XML document and prints it on the

console.

Sax parsing is event based modeling. When a Sax parser parses a XML document and

every time it encounters a tag it calls the corresponding tag handler methods.

When it encounters a Start Tag it calls this method

public void startElement(String uri,..

When it encounters a End Tag it calls this method

public void endElement(String uri,...

Like the DOM example this program also parses the xml file, creates a list of

employees and prints it to the console. The steps involved are

• Create a Sax parser and parse the xml

• In the event handler create the employee object

• Print out the data

Basically the class extends DefaultHandler to listen for call back events. And we

interested in start event, end event and character event.

In start event if the element is employee we create a new instant of employee object

and if the element is Name/Id/Age we initialize the character buffer to get the text

value.

In end event if the node is employee then we know we are at the end of the employee

node and we add the Employee object to the list. If it is any other node like

Name/Id/Age we call the corresponding methods like setName/SetId/setAge on the

Employee object. Java Bean classes can be used for this. In character event we store

the data in a temp string variable.

a) Create a Sax Parser and parse the xml

private void parseDocument() { 

 //get a factory 
 SAXParserFactory spf = 
   SAXParserFactory.newInstance(); 
 try { 

  //get a new instance of parser 
  SAXParser sp = spf.newSAXParser(); 

  //parse the file and also register this class for call backs 
  sp.parse("employees.xml", this); 

 }catch(SAXException se) { 
  se.printStackTrace(); 
 }catch(ParserConfigurationException pce) { 
  pce.printStackTrace(); 
 }catch (IOException ie) { 
  ie.printStackTrace(); 
 } 
}

b) In the event handlers create the Employee object and call the corresponding setter

methods.

public void startElement(String uri, String localName, String 
  qName, Attributes attributes) throws SAXException { 
 //reset 
 tempVal = ""; 
 if(qName.equalsIgnoreCase("Employee")) { 
  //create a new instance of employee 
  tempEmp = new Employee(); 
  tempEmp.setType(attributes.getValue("type")); 
 } 
} 

public void characters(char[] ch, int start, int length) 
  throws SAXException { 
 tempVal = new String(ch,start,length); 
} 

public void endElement(String uri, String localName, 
  String qName) throws SAXException { 

 if(qName.equalsIgnoreCase("Employee")) { 
  //add it to the list 
  myEmpls.add(tempEmp); 

 }else if (qName.equalsIgnoreCase("Name")) { 
  tempEmp.setName(tempVal); 
 }else if (qName.equalsIgnoreCase("Id")) { 
  tempEmp.setId(Integer.parseInt(tempVal)); 
 }else if (qName.equalsIgnoreCase("Age")) { 
  tempEmp.setAge(Integer.parseInt(tempVal)); 
 } 

}

c) Iterating and printing.

private void printData(){ 
 
 System.out.println("No of Employees '" + myEmpls.size() + 
"'."); 
 
  Iterator it = myEmpls.iterator(); 
  while(it.hasNext()) { 
   System.out.println(it.next().toString()); 
  } 
 }

Writing XML with Java

The previous programs illustrated how to parse an existing XML file using both SAX

and DOM Parsers. But generating a XML file from scratch is a different story, for

instance you might like to generate an xml file for the data extracted from a database.

To keep the example simple this program XMLCreatorExample.java generates XML

from a list preloaded with hard coded data. The output will be book.xml file with the

following content.

<?xml version="1.0" encoding="UTF-8"?>

<Books>

<Author>Kathy Sierra .. etc</Author>

<Title>Head First Java</Title>

</Book>

<Author>Kathy Sierra .. etc</Author>

<Title>Head First Design Patterns</Title>

</Book>

</Books>

The steps involved are

• Load Data

• Get an instance of Document object using document builder factory

• Create the root element Books

• For each item in the list create a Book element and attach it to Books element

• Serialize DOM to FileOutputStream to generate the xml file "book.xml".

a) Load Data.

/** 
 * Add a list of books to the list 
 * In a production system you might populate the list from a db
 */ 
private void loadData(){ 
 myData.add(new Book("Head First Java", 
   "Kathy Sierra .. etc","Java 1.5")); 
 myData.add(new Book("Head First Design Patterns", 
   "Kathy Sierra .. etc","Java Architect")); 
}

b) Getting an instance of DOM.

/** 
 * Using JAXP in implementation independent manner create a 
document object 
 * using which we create a xml tree in memory 
 */ 
private void createDocument() { 

 //get an instance of factory 
 DocumentBuilderFactory dbf = 
   DocumentBuilderFactory.newInstance(); 
 try { 
  //get an instance of builder 
  DocumentBuilder db = dbf.newDocumentBuilder(); 

  //create an instance of DOM 
  dom = db.newDocument(); 

 }catch(ParserConfigurationException pce) { 
  //dump it 
  System.out.println("Error while trying to 
    instantiate DocumentBuilder " + pce); 
    System.exit(1); 
 } 

}

}

c) Create the root element Books.

/** 
 * The real workhorse which creates the XML structure 
 */ 
private void createDOMTree(){ 

 //create the root element  

 Element rootEle = dom.createElement("Books"); 
 dom.appendChild(rootEle); 

 //No enhanced for 
 Iterator it  = myData.iterator(); 
 while(it.hasNext()) { 
  Book b = (Book)it.next(); 
  //For each Book object  create element and attach it to root 
  Element bookEle = createBookElement(b); 
  rootEle.appendChild(bookEle); 
 } 

}

d) Creating a book element.

/** 
 * Helper method which creates a XML element 
 * @param b The book for which we need to create an xml 
representation 
 * @return XML element snippet representing a book 
 */ 
private Element createBookElement(Book b){ 
 Element bookEle = dom.createElement("Book"); 
 bookEle.setAttribute("Subject", b.getSubject()); 
 //create author element and author text node and attach it to bookElement 
 Element authEle = dom.createElement("Author"); 
 Text authText = dom.createTextNode(b.getAuthor()); 
 authEle.appendChild(authText); 
 bookEle.appendChild(authEle); 
 //create title element and title text node and attach it to bookElement 
 Element titleEle = dom.createElement("Title"); 
    Text titleText = dom.createTextNode(b.getTitle()); 
 titleEle.appendChild(titleText); 
 bookEle.appendChild(titleEle); 
 return bookEle; 
}

e) Serialize DOM to FileOutputStream to generate the xml file "book.xml".

/** 
 * This method uses Xerces specific classes 
 * prints the XML document to file. 
 */ 
private void printToFile(){ 
 try 
 { 
  //print 
  OutputFormat format = new OutputFormat(dom); 
  format.setIndenting(true); 
  //to generate output to console use this serializer 
  //XMLSerializer serializer = new 
  XMLSerializer(System.out, format); 

  //to generate a file output use fileoutputstream instead of 
  system.out 
  XMLSerializer serializer = new XMLSerializer( 
    new FileOutputStream(new File("book.xml")), 
    format); 
  serializer.serialize(dom); 
 } catch(IOException ie) { 
  ie.printStackTrace(); 
 } 
}

Note:

The Xerces internal classes OutputFormat and XMLSerializer are in different

packages.

In JDK 1.5 with built in Xerces parser they are under

com.sun.org.apache.xml.internal.serialize.OutputFormat

com.sun.org.apache.xml.internal.serialize.XMLSerializer

In Xerces 2.7.1 which we are using to run these examples they are under

org.apache.xml.serialize.XMLSerializer

org.apache.xml.serialize.OutputFormat

We are using Xerces 2.7.1 with JDK 1.4 and JDK 1.3 as the default parser with JDK

1.4 is Crimson and there is no built in parser with JDK 1.3.

Also please remember it is not advisable to use parser implementation specific classes

like OutputFormat and XMLSerializer as they are only available in Xerces and if

you switch to another parser in the future you may have to rewrite.

Another example, of writing a XML containing the first 10 Fibonacci numbers is as

follows.

<?xml version="1.0"?>
<Fibonacci_Numbers>
<fibonacci>1</fibonacci>
<fibonacci>1</fibonacci>
<fibonacci>2</fibonacci>
<fibonacci>3</fibonacci>
<fibonacci>5</fibonacci>
<fibonacci>8</fibonacci>
<fibonacci>13</fibonacci>
<fibonacci>21</fibonacci>
<fibonacci>34</fibonacci>
<fibonacci>55</fibonacci>
</Fibonacci_Numbers>

To produce this, just add string literals for the <fibonacci> and </fibonacci> tags

inside the print statements, as well as a few extra print statements to produce the XML

declaration and the root element start- and end-tags. XML documents are just text,

and you can output them any way you’d output any other text document. The

FibonacciXML.java is created for this.

import java.math.BigInteger;

public class FibonacciXML {
 public static void main(String[] args) {
  BigInteger low = BigInteger.ONE;
  BigInteger high = BigInteger.ONE;
  System.out.println("");
  System.out.println("");
  for (int i = 0; i < 10; i++) {
   System.out.print(" ");
   System.out.print(low);
   System.out.println("");
   BigInteger temp = high;
   high = high.add(low);
   low = temp;
  }
  System.out.println("");
 }
}

Running Programs in JAVA

The instructions to compile and run these programs vary, based on the JDK that you

are using. This is due to the way the XML parser is bundled with various Java

distributions. These instructions are for Windows OS. For Unix or Linux OS you just

need to change the folder paths accordingly. Xerces parser is bundled with the JDK

1.5 distribution. So you need not download the parser separately.

Running DOMParserExample

1. Place DomParserExample.java, Employee.java, employees.xml to

c:\xercesTest

2. Go to command prompt and type

cd c:\xercesTest

3. To compile, type

javac -classpath . DomParserExample.java

4. To run, type

java -classpath . DomParserExample

Running SAXParserExample

1. Place SAXParserExample.java, Employee.java, employees.xml to

c:\xercesTest

2. Go to command prompt and type

cd c:\xercesTest

3. To compile, type

javac -classpath . SAXParserExample.java

4. To run,type

java -classpath . SAXParserExample

Running XMLCreatorExample

1. Place XMLCreatorExample.java, Book.java to c:\xercesTest

2. Go to command prompt and type

cd c:\xercesTest

3. To compile, type

javac -classpath . XMLCreatorExample.java

4. To run, type

java -classpath . XMLCreatorExample

Running FibonacciXML

1. Place FibonacciXML.java to c:\xercesTest

2. Go to command prompt and type

cd c:\xercesTest

Internal Use 15

XML Parsing with JAVA

3. To compile, type

javac -classpath . FibonacciXML.java

4. To run, type

java -classpath .

Comparison

Both SAX & DOM have there advantages & disadvantages but need to be used

according to the requirement.

SAX:

Parses node by node
Doesn’t store the XML in memory
We can’t insert or delete a node
Top to bottom traversing

DOM

Stores the entire XML document into memory before processing.
Occupies more memory
We can insert or delete nodes
Traverse in any direction.

If we need to find a node and doesn’t need to insert or delete we can go with SAX

itself otherwise we can use DOM parser, provided we have enough memory in place.

Conclusion

I hope this document will be useable to enlighten a beginner to be able to successfully

code for extracting data from an xml. XMLs are one of the most widely used

structures for storing data, and Java provides the most commonly used parsers. In real

life situations, we receive XMLs from a third party source which are needed to be

parsed & data need to be stored in databases. These motives can be easily met using

DOM or SAX XML parsers in Java.

We can have a JMS configured system where XMLs are received in a

automated way(these can be done using MDB), the same XMLs can be parsed using

Java parsers. The parser code can be scheduled to run automatically using a .ksh

script. The parsed value can be easily stored in oracle databases using simple JDBC

codes. In most of the IT projects XML sizes are usually huge & those are complex. In

such cases it is not possible to use DOM parser, but SAX parser is used frequently.

Although DOM parsers are easier to be coded, SAX parsers are more rapidly used in

case of real-life systems.

Java Programs and Examples with Output

Pages

XML Parsing with JAVA with Example

Leave a Reply

List of Java Programs

Total Pageviews

Followers

Popular Posts of This Week

Archives

Our Blogs

Labels

Popular Posts