Hi All,
Today, I am going to show you how to create Generic XML Parser using SAX in Java.
Idea is to pass the name of the nodes and get its values. I will be using SAX parser for this and a third party API called Guava from Google. One extra functionality in this code is to get the names of node based on some pattern or just say regular expression.
Here is the code for Generic XML parser.
GenericXMLParserSAX.java
package xmlparser.sax; import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; import com.google.common.collect.ArrayListMultimap; import com.google.common.collect.ListMultimap; public class GenericXMLParserSAX extends DefaultHandler { private ListMultimap<String, String> listMultimap = ArrayListMultimap.create(); String tempCharacter; private String[] startElements; private String[] endElements; public void setStartElements(String[] startElements) { this.startElements = startElements; } public String[] getStartElements() { return startElements; } public void setEndElements(String[] endElements) { this.endElements = endElements; } public String[] getEndElements() { return endElements; } public void parseDocument(String xml, String[] startElements, String[] endElements) { setStartElements(startElements); setEndElements(endElements); SAXParserFactory spf = SAXParserFactory.newInstance(); try { SAXParser sp = spf.newSAXParser(); InputStream inputStream = new ByteArrayInputStream(xml.getBytes()); sp.parse(inputStream, this); } catch(SAXException se) { se.printStackTrace(); } catch(ParserConfigurationException pce) { pce.printStackTrace(); } catch (IOException ie) { ie.printStackTrace(); } } public void parseDocument(String xml, String[] endElements) { setEndElements(endElements); SAXParserFactory spf = SAXParserFactory.newInstance(); try { SAXParser sp = spf.newSAXParser(); InputStream inputStream = new ByteArrayInputStream(xml.getBytes()); sp.parse(inputStream, this); } catch(SAXException se) { se.printStackTrace(); } catch(ParserConfigurationException pce) { pce.printStackTrace(); } catch (IOException ie) { ie.printStackTrace(); } } @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { String[] startElements = getStartElements(); if(startElements!= null){ for(int i = 0; i < startElements.length; i++) { if(qName.startsWith(startElements[i])) { listMultimap.put(startElements[i], qName); } } } } @Override public void characters(char[] ch, int start, int length) throws SAXException { tempCharacter = new String(ch, start, length); } @Override public void endElement(String uri, String localName, String qName) throws SAXException { String[] endElements = getEndElements(); for(int i = 0; i < endElements.length; i++) { if (qName.equalsIgnoreCase(endElements[i])) { listMultimap.put(endElements[i], tempCharacter); } } } public ListMultimap<String, String> multiSetResult() { return listMultimap; } }
The idea is to override the startElement() and endElement() methods. And store the value in a Collection called ListMultiMap provided in Guava by Google.
Check the overloaded method parseDocument(). The first one is if you want to get the names of nodes of a particular pattern or say regular expression based. The second one is for getting the values for node names passed as String[] parameter.
Now let’s see the usage.
TestParser.java
package xmlparser.sax; import com.google.common.collect.ListMultimap; public class TestParser { public static void main(String[] args) { String xml = "<response><item><checkpoint_01><id>x_1</id><city>city_name_a</city><province>province_name_a</province><country>country_name_a</country></checkpoint_01><checkpoint_02><id>x_2</id><city>city_name_b</city><province>province_name_b</province><country>country_name_b</country></checkpoint_02><checkpoint_03><id>x_1</id><city>city_name_c</city><province>province_name_c</province><country>country_name_c</country></checkpoint_03></item></response>"; if(xml == null || xml == "") { System.out.println("Nothing"); } GenericXMLParserSAX genericXMLParserSAX = new GenericXMLParserSAX(); String[] startElement = {"check"}; String[] endElement = {"id", "city", "province"}; genericXMLParserSAX.parseDocument(xml, startElement, endElement); ListMultimap<String, String> xmlData = genericXMLParserSAX.multiSetResult(); System.out.println(xmlData.toString()); System.out.println("<<================"); for(int i = 0; i < xmlData.get("check").size(); i++) { System.out.println(xmlData.get("check").get(i) + " = id => " + xmlData.get("id").get(i)); System.out.println(xmlData.get("check").get(i) + " = city => " + xmlData.get("city").get(i)); System.out.println(xmlData.get("check").get(i) + " = province => " + xmlData.get("province").get(i)); } System.out.println("==================>>"); } }
That’s it guys.
I hope this will save a lot of time. No need to convert to POJO and then retrieve the values. Just create a String[] with name of nodes and pass it to parseDocument method.
I am going to start a project and upload this code in GitHub. Everyone is welcome to contribute to it and make it make more efficient.
UPDATE: https://github.com/niteshapte/generic-xml-parser. Check out the GitHub.
That’s it for today guys.
Critics/suggestion are very much welcome.
Have a nice day ahead.
Cool, did you already create the github project?
Hi Rogier. Thanks. Not yet but soon in next week. I will surely inform you once created.
Hi Rogier,
I have created the GitHub project. Here is the link – https://github.com/niteshapte/generic-xml-parser
Thanks; will look at it :).
Could you please let me know the starting and ending elements of the following XMLs (so that your code would be applied)
IT
333
Delhi
_______
XML:
_______
GreenHorn
444
XMLs are:
————–
Ajay444
IT333Gurgaon
Hi Greenhorn,
Please post you xml using code tag.
I understand this code not return the Attribute values of Any Node..
Any Idea how to get this values from generic code 🙂
Hi Rahul,
As of now, it doesn’t. But if you have idea how to do it, you are welcome to fork it from GitHub and extend the functionality.