• Technical
  • How to create generic XML parser using SAX
Header

How to create generic XML parser using SAX

Hi All,

Today, I am going to show you how to create Generic XML Parser using SAX in Java.

Idea is to pass the name of the nodes and get its values. I will be using SAX parser for this and a third party API called Guava from Google. One extra functionality in this code is to get the names of node based on some pattern or just say regular expression.

Here is the code for Generic XML parser.

GenericXMLParserSAX.java

package xmlparser.sax;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.ListMultimap;

public class GenericXMLParserSAX extends DefaultHandler {

	private ListMultimap<String, String> listMultimap = ArrayListMultimap.create();
	String tempCharacter;	
	private String[] startElements;
	private String[] endElements;

	public void setStartElements(String[] startElements) {
		this.startElements = startElements;
	}

	public String[] getStartElements() {
		return startElements;
	}

	public void setEndElements(String[] endElements) {
		this.endElements = endElements;
	}

	public String[] getEndElements() {
		return endElements;
	}

	public void parseDocument(String xml, String[] startElements, String[] endElements) {
		setStartElements(startElements);
		setEndElements(endElements);

		SAXParserFactory spf = SAXParserFactory.newInstance();
		try {
			SAXParser sp = spf.newSAXParser();			
			InputStream inputStream = new ByteArrayInputStream(xml.getBytes());			
			sp.parse(inputStream, this);
		} catch(SAXException se) {
			se.printStackTrace();
		} catch(ParserConfigurationException pce) {
			pce.printStackTrace();
		} catch (IOException ie) {
			ie.printStackTrace();
		}
	}

	public void parseDocument(String xml, String[] endElements) {		
		setEndElements(endElements);

		SAXParserFactory spf = SAXParserFactory.newInstance();
		try {
			SAXParser sp = spf.newSAXParser();			
			InputStream inputStream = new ByteArrayInputStream(xml.getBytes());			
			sp.parse(inputStream, this);
		} catch(SAXException se) {
			se.printStackTrace();
		} catch(ParserConfigurationException pce) {
			pce.printStackTrace();
		} catch (IOException ie) {
			ie.printStackTrace();
		}
	}

	@Override
	public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
		String[] startElements = getStartElements();

		if(startElements!= null){
			for(int i = 0; i < startElements.length; i++) {
				if(qName.startsWith(startElements[i])) {				
					listMultimap.put(startElements[i], qName);
				}
			}	
		}
	}

	@Override
	public void characters(char[] ch, int start, int length) throws SAXException {
		tempCharacter = new String(ch, start, length);
	}

	@Override
	public void endElement(String uri, String localName, String qName) throws SAXException {		
		String[] endElements = getEndElements();

		for(int i = 0; i < endElements.length; i++) {
			if (qName.equalsIgnoreCase(endElements[i])) {
				listMultimap.put(endElements[i], tempCharacter);		
			}
		}
	}

	public ListMultimap<String, String> multiSetResult() {		
		return listMultimap;
	}
}

The idea is to override the startElement() and endElement() methods. And store the value in a Collection called ListMultiMap provided in Guava by Google.

Check the overloaded method parseDocument(). The first one is if you want to get the names of nodes of a particular pattern or say regular expression based. The second one is for getting the values for node names passed as String[] parameter.

Now let’s see the usage.

TestParser.java

package xmlparser.sax;

import com.google.common.collect.ListMultimap;

public class TestParser {

	public static void main(String[] args) {		

		String xml = "<response><item><checkpoint_01><id>x_1</id><city>city_name_a</city><province>province_name_a</province><country>country_name_a</country></checkpoint_01><checkpoint_02><id>x_2</id><city>city_name_b</city><province>province_name_b</province><country>country_name_b</country></checkpoint_02><checkpoint_03><id>x_1</id><city>city_name_c</city><province>province_name_c</province><country>country_name_c</country></checkpoint_03></item></response>";

		if(xml == null || xml == "") {
			System.out.println("Nothing");
		}

		GenericXMLParserSAX genericXMLParserSAX = new GenericXMLParserSAX();
		String[] startElement = {"check"};
		String[] endElement = {"id", "city", "province"};

		genericXMLParserSAX.parseDocument(xml, startElement, endElement);

		ListMultimap<String, String> xmlData  = genericXMLParserSAX.multiSetResult();

		System.out.println(xmlData.toString());
		System.out.println("<<================");

		for(int i = 0; i < xmlData.get("check").size(); i++) {			
			System.out.println(xmlData.get("check").get(i) + " = id => " + xmlData.get("id").get(i));
			System.out.println(xmlData.get("check").get(i) + " = city => " + xmlData.get("city").get(i));
			System.out.println(xmlData.get("check").get(i) + " = province => " + xmlData.get("province").get(i));			
		}

		System.out.println("==================>>");
	}
}

That’s it guys.

I hope this will save a lot of time. No need to convert to POJO and then retrieve the values. Just create a String[] with name of nodes and pass it to parseDocument method.

I am going to start a project and upload this code in GitHub. Everyone is welcome to contribute to it and make it make more efficient.

UPDATE: https://github.com/niteshapte/generic-xml-parser. Check out the GitHub.

That’s it for today guys.

Critics/suggestion are very much welcome.

Have a nice day ahead.

 

9,549 total views, 1 views today

This entry was posted in Technical

9 Responses



Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

Follow Me