The Rasmus Torkel XML API

Table of Contents

1Coding philosophy
2An XML read example
3An XML write example
4XML without files
5Exception examples
5.1Syntax error
5.2Wrong root node
5.3Missing name field
5.4Non-integer where integer was expected
6Validation of returned data
7Not supported: References to external files
8Nested XML
9Set validation for returned types
10Name Spaces
10.1TagNodeId class for combining relative name and name space
10.2The "any" name space
11Attributes
11.1Name Spaces in Attributes
12Ordered versus unordered retrieval
13Subnodes or text but not both option
14Character sets
14.1Character sets when reading
14.1.1Illegal name start
14.1.2Node name character outside name set
14.1.3Attribute name character outside name set
14.1.4Text character outside name set
14.1.5Text character outside name set - control character
14.2Character sets when writing
14.2.1Writing a permitted character outside core set
14.2.2Writing a disallowed characters
15Object Oriented programming and this API - the basics
15.0.1Basic object oriented writing to XML
15.0.2Basic object oriented reading from XML
16Object Oriented programming and this API - reading and writing XML files with one line of code
17Object Oriented programming and this API - inheritance
17.1Specifying the class in the attribute classOfObject
17.1.1Exceptions
17.1.2Abstract class at the root of inheritance hierarchy
17.2Inheritance by natural tag node id
17.2.1Exceptions
17.2.2Debugging the inheritance hierarchy
17.3XSD choice style inheritance
17.4Disabling inheritance
18XML Heading ignored

1. Coding philosophy

Operations that are performed all the time should be able to be performed with one line of code (without nesting and chaining of method invocations). And if there is a problem in an XML file, that problem should be easy to pin-point, meaning firstly that the exceptions generated by the XML library are useful and also that coders using the library can easily generate exceptions containing the location of the file where the problem occured.

This API is intended to meet these objectives resulting in an easy-to-use API.

2. An XML read example

Consider an XML file with contents as below:
<sample>
    <name>Rasmus Torkel</name>
    <numberOfFingers>10</numberOfFingers>
    <file>c:/users/Torkel/Rasmus/xml_samples/simple.xml</file>
</sample>
Let's suppose this file is subject to the following conventions: What we have above is something very simple with four different nodes, and the rules for each of the nodes are such as might be encountered in typical XML. Therefore, we should be able to process these four nodes with not much more than four simple lines of code (assuming we already have a java.io.File object for the XML file). We do need to verify that we don't have any surplus nodes and close the file. So we need five lines.
TagNode sampleNode = XmlReader.xmlFileToRoot(xmlFile, XmlReadOptions.DEFAULT, "sample");
String name = sampleNode.nextTextFieldE("name");
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10);
File file = (File)sampleNode.nextStringMappedFieldN("file", File.class);
sampleNode.verifyNoMoreChildren();
A few points about the above code: The above code is only a small sample of what can be done. Obviously, the library can process attributes, nodes with name spaces and deeply nested XML. Also, many more types are supported, including the reading of enum instances (which also requires reflection).

In this example, we specified that we expected a "sample" root node. There also functions for opening XML which don't assume that the name of the root node is known.

3. An XML write example

The below code sample shows how to write such data.
XmlSink xmlSink = new XmlSink(outputFile);
xmlSink.startNode("sample");
xmlSink.sinkSimpleNode("name", name);
xmlSink.sinkSimpleNode("numberOfFingers", numberOfFingers);
xmlSink.sinkSimpleNode("file", file);
xmlSink.closeNode();
A few points about the above code:

4. XML without files

XML can be read from and written to strings. This is how you get the reading of an XML String started:
TagNode rootNode = XmlReader.xmlStringToRoot(xmlString, "XML String Demo", XmlReadOptions.DEFAULT, "sample");
One point on the above: The second parameter gives some sort of context string which will find its way into exception messages if there are any exceptions. The method that we saw earlier for opening an XML file did not have this parameter because it generates its context string from the file name. Version of xmlFileToRoot functions that have an explicit context para meter also exist. Writing XML to a String is even easier to set up. See below:
XmlSink sink = new XmlSink();
After writing XML, you simply call the toString function:
String xmlString = sink.toString();
There also readerToRoot functions which take XML from java.io.Reader objects.

5. Exception examples

We give here just a few examples of XML which is either incorrect or which fails the validation that arises from how the invoker invokes it. All of the examples are based on the XML reading code we have already seen. All exceptions thrown when reading with this API are of subclasses of the class rasmus_torkel.xml_basic.read.exception.ReadException which is a RuntimeException.

5.1. Syntax error

Here is the XML:
<<sample>
  <name>Rasmus Torkel</name>
  <numberOfFingers>10</numberOfFingers>
  <file>.\data\xx.txt</file>
</sample>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlSyntaxException: demo: Unrecognised < construct at line 1, char 1, following character '<' is not compatible with XML tag

5.2. Wrong root node

Here is the XML:
<example>
  <name>Rasmus Torkel</name>
  <numberOfFingers>10</numberOfFingers>
  <file>.\data\xx.txt</file>
</example>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlWrongNodeException: demo: Got example without name space at line 1, char 1, expected sample without name space

5.3. Missing name field

Here is the XML:
<sample>
  <numberOfFingers>10</numberOfFingers>
  <file>.\data\xx.txt</file>
</sample>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlWrongNodeException: demo: Attempted to read name without name space under sample, starting at line 1, char 1 but got numberOfFingers without name space at line 2, char 3

5.4. Non-integer where integer was expected

Here is the XML:
<sample>
  <name>Rasmus Torkel</name>
  <numberOfFingers>ten</numberOfFingers>
  <file>.\data\xx.txt</file>
</sample>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlTextFieldDoesNotMatchTypeException: demo: numberOfFingers, starting at line 3, char 3 contains text ten which can not be converted to int

6. Validation of returned data

Let's have another look at some of the original retrieval code:
String name = sampleNode.nextTextFieldE("name");
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10);

Let's say we wanted to restrict the number of fingers to values from 0 to 14. We could do it like this:

if (numberOfFingers < 0 || numberOfFingers > 14)
{
    throw new RuntimeException(
        sampleNode + " contains numberOfFingers field which has value " +
        numberOfFingers + " which is not in the range from 0 to 14");
}
Bad values then lead to exceptions like this:
java.lang.RuntimeException: sample, starting at line 1, char 1 contains numberOfFingers field which has value -1 which is not in the range from 0 to 14

This is actually not too bad. We don't have a variable for the node containing the number of fingers so we can't mention it in the error message. But we have the sample node and it appears in the error message with location. That location comes from the _textPos field of the node.

We can do better. We define some sets like this:

StringPatternSet nameSet = new StringPatternSet("^[A-Z][a-z]+ [A-Z][a-z]+$");
IntRange numberOfFingersRange = new IntRange(0, 14);

While we were at it, we defined what kind of names we allow. We then change the retrieval code to refer to those sets:

String name = sampleNode.nextTextFieldE("name", nameSet);
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10, numberOfFingersRange);

Bad values then lead to exceptions like this:

rasmus_torkel.xml_basic.read.exception.XmlTextFieldNotInSetException: demo: numberOfFingers, starting at line 3, char 3 contains int -1 which is not in IntSet 0 to 14
and
rasmus_torkel.xml_basic.read.exception.XmlTextFieldNotInSetException: demo: Node name, starting at line 2, char 3 contains text Rasmus which is not in StringSet /^[A-Z][a-z]+ [A-Z][a-z]+$/

7. Not supported: References to external files

In XML it is possible to declare what kind of data is expected using XML DTD and XML Schema files. This API has a different validation philosophy which is based on the invoker specifying what is expected in the Java calls and the API validating against that. Therefore this API does not support referencing external files for validation. Declarations of such files are simply ignored.

In fact all references to external files are ignored.

8. Nested XML

Suppose in our XML example, we want to break the name up into firstName and surname. We would end up with XML like this:
<sample>
  <name>
    <firstName>Rasmus</firstName>
    <surname>Torkel</surname>
  </name>
  <numberOfFingers>10</numberOfFingers>
  <file>.\data\xx.txt</file>
</sample>
We would write it like this:
XmlSink xmlSink = new XmlSink();
xmlSink.startNode("sample");
xmlSink.startNode("name");
xmlSink.sinkSimpleNode("firstName", firstName);
xmlSink.sinkSimpleNode("surname", surname);
xmlSink.closeNode();
xmlSink.sinkSimpleNode("numberOfFingers", numberOfFingers);
xmlSink.sinkSimpleNode("file", file);
xmlSink.closeNode();
String xmlString = xmlSink.toString();
And we read it like this:
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, "sample");
TagNode nameNode = sampleNode.nextChildE("name");
String firstName = nameNode.nextTextFieldE("firstName");
String surname = nameNode.nextTextFieldE("surname");
nameNode.verifyNoMoreChildren();
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10);
File file = (File)sampleNode.nextStringMappedFieldN("file", File.class);
sampleNode.verifyNoMoreChildren();
There root node is written the same as any other node. The reading differs slightly because the root node obviously has to be obtained differently.

9. Set validation for returned types

Let's go back to some of our original XML reading example:
String name = sampleNode.nextTextFieldE("name");
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10);

Let's say we wanted to only allow number from 0 to 14 for number of fingers. Here is some code which we might use to enforce the range:


Bad values then lead to exceptions like this:

java.lang.RuntimeException: sample, starting at line 1, char 1 contains numberOfFingers field which has value -1 which is not in the range from 0 to 14
Reading with set validation

This is actually not too bad. Since we don't have a variable for the node containing the number of fingers, we can't mention it in the error message. But at least we have a handle on the sample node and when we call toString on it, we get the location within the file. That location comes from the _textPos field of the node.

But we can do better. We define some validation.

StringPatternSet nameSet = new StringPatternSet("^[A-Z][a-z]+ [A-Z][a-z]+$");
IntRange numberOfFingersRange = new IntRange(0, 14);

Apart from the range for number of fingers, we defined some name validation while we were at it. Then we retrieve fields like this:

String name = sampleNode.nextTextFieldE("name", nameSet);
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10, numberOfFingersRange);
Bad values then lead to exceptions like this:
rasmus_torkel.xml_basic.read.exception.XmlTextFieldNotInSetException: demo: numberOfFingers, starting at line 3, char 3 contains int -1 which is not in IntSet 0 to 14
and
rasmus_torkel.xml_basic.read.exception.XmlTextFieldNotInSetException: demo: Node name, starting at line 2, char 3 contains text Rasmus which is not in StringSet /^[A-Z][a-z]+ [A-Z][a-z]+$/

10. Name Spaces

This API supports XML name spaces. XML name space variable are created using the preferred prefix and the name space name. We write our original sample data like this:
XmlNameSpace ns = new XmlNameSpace("x", "http://rasmustorkel.com/ns1/");
XmlSink xmlSink = new XmlSink();
xmlSink.startNode("sample", ns);
xmlSink.sinkSimpleNode("name", ns, name);
xmlSink.sinkSimpleNode("numberOfFingers", ns, numberOfFingers);
xmlSink.sinkSimpleNode("file", ns, file);
xmlSink.closeNode();
String xmlString = xmlSink.toString();
This leads to XML like this:
<x:sample xmlns:x="http://rasmustorkel.com/ns1/">
  <x:name>Rasmus Torkel</x:name>
  <x:numberOfFingers>10</x:numberOfFingers>
  <x:file>.\data\xx.txt</x:file>
</x:sample>
And this is how you read it:
XmlNameSpace ns = new XmlNameSpace("y", "http://rasmustorkel.com/ns1/");
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, "sample", ns);
String name = sampleNode.nextTextFieldE("name", ns);
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", ns, 10);
File file = (File)sampleNode.nextStringMappedFieldN("file", ns, File.class);
sampleNode.verifyNoMoreChildren();
Note that the name space in the read example has a different prefix to the write example. This is to illustrate that the prefix you specify for your name space variable does not matter when you read XML. That's because the prefix is defined in the name space declaration in the XML.

Even when writing, the prefix only indicates a preference which will be observed when no name spaces with the same prefix are used in the same document. Otherwise prefixes are changed.

XmlNameSpace ns1 = new XmlNameSpace("x", "http://rasmustorkel.com/ns1/");
XmlNameSpace ns2 = new XmlNameSpace("x", "http://rasmustorkel.com/ns2/");
XmlSink xmlSink = new XmlSink();
xmlSink.startNode("sample");
xmlSink.sinkSimpleNode("name", ns1, name);
xmlSink.sinkSimpleNode("numberOfFingers", ns2, numberOfFingers);
xmlSink.sinkSimpleNode("file", file);
xmlSink.closeNode();
String xmlString = xmlSink.toString();
This leads to XML like this:
<sample>
  <x:name xmlns:x="http://rasmustorkel.com/ns1/">Rasmus Torkel</x:name>
  <x_:numberOfFingers xmlns:x_="http://rasmustorkel.com/ns2/">10</x_:numberOfFingers>
  <file>.\data\xx.txt</file>
</sample>
In the XML above, for the second name space, an underscore has been added to the prefix.

Strictly speaking, there was no name space clash as the scope of the name space is only for the node declaring it and its contents, so it would have been legal to use the same prefix for both name spaces. However, this API does not use the same prefix for different name spaces in the document, whether there is a clash or not.

10.1. TagNodeId class for combining relative name and name space

As relative name and name space together identify a node it makes sense for there to be a class that holds them. This class is called TagNodeId. We can then declare our node identifiers up front:
public static final XmlNameSpace NS_SAMPLE = new XmlNameSpace("x", "http://rasmustorkel.com/ns1/");
public static final TagNodeId ID_SAMPLE =            new TagNodeId("sample", NS_SAMPLE);
public static final TagNodeId ID_NAME =              new TagNodeId("name", NS_SAMPLE);
public static final TagNodeId ID_NUMBER_OF_FINGERS = new TagNodeId("numberOfFingers", NS_SAMPLE);
public static final TagNodeId ID_FILE =              new TagNodeId("file", NS_SAMPLE);
Here are our sample data again:
<x:sample xmlns:x="http://rasmustorkel.com/ns1/">
  <x:name>Rasmus Torkel</x:name>
  <x:numberOfFingers>10</x:numberOfFingers>
  <x:file>.\data\xx.txt</x:file>
</x:sample>
This how you read the sample data, having declared the node identifiers:
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, ID_SAMPLE);
String name = sampleNode.nextTextFieldE(ID_NAME);
int numberOfFingers = sampleNode.nextIntFieldD(ID_NUMBER_OF_FINGERS, 10);
File file = (File)sampleNode.nextStringMappedFieldN(ID_FILE, File.class);
sampleNode.verifyNoMoreChildren();
And this is how you write the sample data:
XmlSink xmlSink = new XmlSink();
xmlSink.startNode(ID_SAMPLE);
xmlSink.sinkSimpleNode(ID_NAME, name);
xmlSink.sinkSimpleNode(ID_NUMBER_OF_FINGERS, numberOfFingers);
xmlSink.sinkSimpleNode(ID_FILE, file);
xmlSink.closeNode();
String xmlString = xmlSink.toString();
It's ok do declare TagNodeId objects with null name space. That just means an identifier with no name space.

10.2. The "any" name space

When you specify a null name space for the node you are reading you are specifying that you want a node that has no name space.

But you can also specify that you want a node in any (or no name space). In other words, you only match the relative name and you don't care what, if any, name space the node is in. For that you supply the dummy name space XmlNameSpace.ANY which is defined like this:

public static final XmlNameSpace ANY =
    new XmlNameSpace(
            "any",
            "Not_a_real_namespace__pass_into_read_methods_to_indicate_namespace_doesnt_matter");

11. Attributes

Here is how might express our sample data if we were going to use attributes rather than child nodes:
<sample
  name="Rasmus Torkel"
  numberOfFingers="10"
  file=".\data\xx.txt"/>
We write this kind of XML like this:
XmlSink xmlSink = new XmlSink();
xmlSink.startNode("sample");
xmlSink.sinkAttribute("name", name);
xmlSink.sinkAttribute("numberOfFingers", numberOfFingers);
xmlSink.sinkAttribute("file", file);
xmlSink.closeTagAndNode();
String xmlString = xmlSink.toString();
And we read it like this:
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, "sample");
String name = sampleNode.attributeValueE("name");
int numberOfFingers = sampleNode.attributeIntD("numberOfFingers", 10);
File file = (File)sampleNode.attributeStringMappedN("file", File.class);
sampleNode.verifyNoMoreChildren();

11.1. Name Spaces in Attributes

Just like node names, attribute names can also be qualified by name spaces. This feature is little-known and rarely used, so this API has no convenience functions for it. Consequently if you need to manipulate attributes with name spaces, the code is going to be somewhat less pretty. Here is how we might express our sample data:
<sample
  xmlns:x="http://rasmustorkel.com/ns1/"
  x:name="Rasmus Torkel"
  x:numberOfFingers="10"
  x:file=".\data\xx.txt"/>
And this is how we write it:
XmlNameSpace ns = new XmlNameSpace("x", "http://rasmustorkel.com/ns1/");
XmlSink xmlSink = new XmlSink();
xmlSink.startNode("sample");
xmlSink.sinkAttribute(new XmlAttribute("name", ns, name));
xmlSink.sinkAttribute(new XmlAttribute("numberOfFingers", ns, String.valueOf(numberOfFingers)));
xmlSink.sinkAttribute(new XmlAttribute("file", ns, String.valueOf(file)));
xmlSink.closeTagAndNode();
String xmlString = xmlSink.toString();
We read it like this:
XmlNameSpace ns = new XmlNameSpace("x", "http://rasmustorkel.com/ns1/");
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, "sample");
String name = sampleNode.attributeValueN("name", ns);
String numberOfFingersStr = sampleNode.attributeValueN("numberOfFingers", ns);
String fileStr = sampleNode.attributeValueN("file", ns);
sampleNode.verifyNoMoreChildren();
You will have to do your own handling of missing fields and your own conversion to the correct types.

12. Ordered versus unordered retrieval

In all the code samples so far, other than the ones involving attributes, we have presumed that the order in which nodes occur is fixed. This API also allows unordered retrieval. However, if you can commit to only doing ordered retrieval, which is what you are doing when you specify the default read options, then this API will null out subnodes after processing. This allows garbage collection to recycle the memory used to hold XML nodes before the XML file is fully scanned. Obviously for small XML inputs, this does not matter, but if you are processing files containing hundreds of megabytes, then it's worth considering. Our original sample data, with order of subnodes changed, might look like this:
<sample>
  <numberOfFingers>10</numberOfFingers>
  <file>.\data\xx.txt</file>
  <name>Rasmus Torkel</name>
</sample>
And not knowing the order of the subnodes, this is the code to read it
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.ANY_ORDER_RETRIEVAL, "sample");
String name = sampleNode.anyOrderTextFieldE("name");
int numberOfFingers = sampleNode.anyOrderIntFieldD("numberOfFingers", 10);
File file = (File)sampleNode.anyOrderStringMappedFieldN("file", File.class);
sampleNode.verifyNoOtherChildren("name", "numberOfFingers", "file");
Note the changed read options. The difference between XmlReadOptions.ANY_ORDER_RETRIEVAL and XmlReadOptions.DEFAULT is the the field below which is true for DEFAULT and false for ANY_ORDER_RETRIEVAL:
public final boolean _orderedRetrieval;
Below is the variation of the previous retrieval code with the default read options:
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, "sample");
String name = sampleNode.anyOrderTextFieldE("name");
int numberOfFingers = sampleNode.anyOrderIntFieldD("numberOfFingers", 10);
File file = (File)sampleNode.anyOrderStringMappedFieldN("file", File.class);
sampleNode.verifyNoOtherChildren("name", "numberOfFingers", "file");
If you were to run this code on the sample data, you would get this exception:
rasmus_torkel.xml_basic.read.exception.UnorderedRetrievalException: demo: anyOrder operation invoked with ordered retrieval read option

You can still work through the nodes in ordered manner but there aren't convenience functions to support it, so the code is a little cumbersome. The parts for handling missing data and for converting to desired types is ommitted.

TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, "sample");
String name = null;
String numberOfFingersStr = null;
String fileStr = null;
while (sampleNode.peepChildN() != null)
{
    TagNode childNode = sampleNode.nextChildE();
    TagNodeId id = childNode._id;
    if (id._nameSpace != null)
    {
        throw new RuntimeException(childNode + " has name space");
    }
    if (id._relativeName.equals("name"))
    {
        if (name != null)
        {
            throw new RuntimeException(childNode + " is the second name node");
        }
        name = childNode.onlyText();
    }
    else if (id._relativeName.equals("numberOfFingers"))
    {
        if (numberOfFingersStr != null)
        {
            throw new RuntimeException(childNode + " is the second numberOfFingers node");
        }
        numberOfFingersStr = childNode.onlyText();
    }
    else if (id._relativeName.equals("file"))
    {
        if (fileStr != null)
        {
            throw new RuntimeException(childNode + " is the second file node");
        }
        fileStr = childNode.onlyText();
    }
    else
    {
        throw new RuntimeException(childNode + " has unexpected name");
    }
}

13. Subnodes or text but not both option

By default, the API validates when reading that each node contains only subnodes or non-whitespace text but not both. That appears to be quite a common pattern in XML schemas. Suppose we create our sample data with extra text in between like this:
XmlSink xmlSink = new XmlSink();
xmlSink.startNode("sample");
xmlSink.sinkText("aa");
xmlSink.sinkSimpleNode("name", name);
xmlSink.sinkText("bb");
xmlSink.sinkSimpleNode("numberOfFingers", numberOfFingers);
xmlSink.sinkText("cc");
xmlSink.sinkSimpleNode("file", file);
xmlSink.sinkText("dd");
xmlSink.closeNode();
String xmlString = xmlSink.toString();
System.out.println(xmlString);
We would end up with XML like this:
<sample>aa
  <name>Rasmus Torkel</name>
  bb<numberOfFingers>10</numberOfFingers>
  cc<file>.\data\xx.txt</file>
  dd</sample>
Here is our XML reading code again:
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.DEFAULT, "sample");
String name = sampleNode.nextTextFieldE("name");
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10);
File file = (File)sampleNode.nextStringMappedFieldN("file", File.class);
sampleNode.verifyNoMoreChildren();
If we were to run this code over the XML with extra text, we would get an exception like this:
rasmus_torkel.xml_basic.read.exception.XmlTextAndChildException: demo: sample, starting at line 1, char 1 contains both direct non-whitespace text at line 1, char 9 and child node at line 2, char 3 in contravention of the read options

The read options have the following field:

public final boolean _enforceChildrenXorText;
This field is true in XmlReadOptions.DEFAULT. XmlReadOptions.ALLOW_SUBNODES_AND_TEXT has this field as false and is otherwise identical to DEFAULT. The reading code then becomes:
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.ALLOW_SUBNODES_AND_TEXT, "sample");
String name = sampleNode.nextTextFieldE("name");
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10);
File file = (File)sampleNode.nextStringMappedFieldN("file", File.class);
sampleNode.verifyNoMoreChildren();
This will correctly read the XML. If we want to read the extra text the code becomes:
TagNode sampleNode = XmlReader.xmlStringToRoot(xmlString, "demo", XmlReadOptions.ALLOW_SUBNODES_AND_TEXT, "sample");
String text1 = sampleNode.nextTextN();
String name = sampleNode.nextTextFieldE("name");
String text2 = sampleNode.nextTextN();
int numberOfFingers = sampleNode.nextIntFieldD("numberOfFingers", 10);
String text3 = sampleNode.nextTextN();
File file = (File)sampleNode.nextStringMappedFieldN("file", File.class);
String text4 = sampleNode.nextTextN();
sampleNode.verifyNoMoreChildren();
Here is some code to print the extra text we just retrieved:
System.out.println("text1 = \"" + text1 + "\"");
System.out.println("text2 = \"" + text2 + "\"");
System.out.println("text3 = \"" + text3 + "\"");
System.out.println("text4 = \"" + text4 + "\"");
This produced output like this:
text1 = "aa
  "
text2 = "
  bb"
text3 = "
  cc"
text4 = "
  dd"
We got some extra white space in our retrieved text. This extra white space is actually in the XML so there is nothing wrong with the sample code for reading or the API code for reading. The phenomenon is due to a deliberate decision to build the XML writing parts of the API such that it formats the XML beautifully. This requires the API to automatically insert extra white space.

14. Character sets

When it comes to character sets, this API is written to a philosophy which is somewhat at odds with the XML standard. Don't be alarmed, it will be no problem to enforce the XML standard when reading and to safe-guard against invalid characters being written.

What is a valid character? In this API, this is defined in the read options and the write options that the invoker supplies to the API as opposed to the XML standard. There is only one 16 bit character absolutely excluded and that is the null character. Also excluded are characters that take more than 16 bits because they don't fit into the Java primitive character type. By default, the character set that this API allows is the same as for XML 1.1.

While, depending on the options, this API is not so strict with value characters, that is characters used by attribute values and characters used by text enclosed by node start and end tags, this API strictly enforces the XML standard for node and attribute names. That is greatly helped by the fact that the XML 1.0 standard and XML 1.1 standard are identical for name characters.

We specify character sets using the class rasmus_torkel.set.chars.CharSet which is part of this API. It has a number of subclasses and quite a few CharSet instances are predefined in the class DefinedCharSets.

14.1. Character sets when reading

Let's have a look at the constructor to XmlReadOptions.
public
XmlReadOptions(boolean enforceChildrenOrText,
               boolean orderedRetrieval,
               CharSet nameCharSet,
               CharSet valueCharSet)
Of interest here are the character sets supplied. nameCharSet is the set of characters that the invoker allows for node and attribute names. When this API is used to read XML it will throw an exception if the name is not compliant with the XML standard and it will throw an exception if the name uses characters outside nameCharSet.

It has a number of subclasses and quite a few CharSet instances are predefined in the class DefinedCharSets. We are going to use the following from DefinedCharSets in our examples:

public static final RangeCharSet LATIN_LOWER_CASE_LETTERS =
    new RangeCharSet('a', 'z', "Latin_Lower_Case_Letters");
If we have an existing XmlReadOptions object, we can created new XmlReadOptions objects with a different name CharSet or value CharSet or both like this:
XmlReadOptions optionsLower = XmlReadOptions.DEFAULT.diffCharSet(DefinedCharSets.LATIN_LOWER_CASE_LETTERS);
XmlReadOptions optionsNamesLower = XmlReadOptions.DEFAULT.diffNameCharSet(DefinedCharSets.LATIN_LOWER_CASE_LETTERS);
XmlReadOptions optionsValuesLower = XmlReadOptions.DEFAULT.diffValueCharSet(DefinedCharSets.LATIN_LOWER_CASE_LETTERS);
Let's have some examples of what happens when we invoke the following code with optionsLower:
TagNode node = XmlReader.xmlStringToRoot(xmlString, "Unit Test", readOptions);
System.out.println("nodeName = " + node._id._relativeName);
System.out.println("attributeName = " + node.attribute(0)._name);
System.out.println("attributeValue = " + node.attribute(0)._value);
System.out.println("value = " + node.onlyText());
Take note of the exception type. When the XML standard for names is violated, we get an XmlSyntaxException. But when a character is encountered which violates the read options, we get an XmlCharSetException.

14.1.1. Illegal name start

Here is the Xml:
<1Root aname="avalue">text</1Root>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlSyntaxException: Unit Test: Illegal tag name after < at line 1, char 1, following character '1'/x31 is not a legal XML name starter

14.1.2. Node name character outside name set

Here is the XML:
<rooT aname="avalue">text</rooT>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlCharSetException: Unit Test: Name rooT of node at line 1, char 1 has character 'T'/x54 in position 4 which is not in name char set Latin_Lower_Case_Letters which is specified in read options

14.1.3. Attribute name character outside name set

Here is the XML:
<root Aname="avalue">text</root>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlCharSetException: Unit Test: Attribute name Aname of node at line 1, char 1 has character 'A'/x41 in position 1 which is not in name char set Latin_Lower_Case_Letters which is specified in read options

14.1.4. Text character outside name set

Here is the XML:
<root aname="avalue">texT</root>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlCharSetException: Unit Test: text at line 1, char 22 has character 'T'/x54 which is not in value char set Latin_Lower_Case_Letters which is specified in read options, position within text is 4

14.1.5. Text character outside name set - control character

We obviously should not put control characters into error messages. Let's see how this is handled. Here is the XML:
<root aname="avalue">control&#xA;</root>
And here is the exception:
rasmus_torkel.xml_basic.read.exception.XmlCharSetException: Unit Test: text at line 1, char 22 has character Line Feed/xA which is not in value char set Latin_Lower_Case_Letters which is specified in read options, position within text is 8

Note however, that at the time of writing, there are many characters that have not yet been classified for this API into characters that can be embedded and characters which should be named. For such characters only the hex code will be in the message.

14.2. Character sets when writing

For writing, we have up to three sets of values we can specify. There is the set of characters for names. But for value characters, we have two sets, the core characters and the set of all characters we allow. With the exception of special XML characters, core characters are written as is without any special encoding. The other value characters are encoded using the XML hex encoding. You don't have to worry about XML special characters. Those are always properly handled, even if they are in the core set. Also, the API uses white space characters for formatting, specifically for line ends and for indentation. Which white space characters is specified in the options. But the point is that even if those characters are outside the core set and even if those characters are outside the set of all allowable characters, it will not cause any problem for the formatting, because the formatting aspects of the API completely ignore those sets.

Lets look at the constructor for rasmus_torkel.xml_basic.write.XmlWriteOptions:

public
XmlWriteOptions(CharSet                  nameCharSet,
                CharEncodeOptions        encodeOptions,
                AttributeLineOptions     attributeLineOptions,
                boolean                  attributesSortingOn,
                IndentingLineSinkOptions indentingLineSinkOptions)
Right now we are interested in the first two parameters (the other three give you flexibility about how you want your XML to be formatted). The first one is obviously the set we enforce for node and attribute names. To understand the second one, we have a look at the constructor for rasmus_torkel.text.encode.CharEncodeOptions. Note that the concept of giving special encoding treatment to non-core characters is not XML specific, so XML is not mentioned in the package name.
public
CharEncodeOptions(CharSet           coreChars,
                  CharSet           allChars)
And there are the two sets for values. By default, the core set consists of the those ASCII characters which have a glyph (symbol) associated with them plus plain space. And the set of all allowable chars consists by default of the XML 1.1 characters.

Lets look at some examples of values being written with sinkText where the encode options are defined, somewhat unrealistically, as below:

new CharEncodeOptions(DefinedCharSets.LATIN_LOWER_CASE_LETTERS, DefinedCharSets.LATIN_UPPER_CASE_LETTERS);

14.2.1. Writing a permitted character outside core set

String is "Value". Below is what the text will look like in the XML:
&#x56;alue
x56 is of course the hex code for 'V'. It was hex-encoded because we only had lower case letters in the core set.

14.2.2. Writing a disallowed characters

String is "the_value". Here is the exception:
rasmus_torkel.text.encode.UnsupportedCharacterEncodeException: Character '_'/x5F at position 4 of between-XML-tag text the_value is not in value char set Latin_Letters

15. Object Oriented programming and this API - the basics

So far, we have been looking on how to read and write bits and pieces of data from and to XML. Now we are starting to take a more Object Oriented approach and we start looking at how to equip classes such that objects of that class can be easily written to and read from XML.

15.0.1. Basic object oriented writing to XML

For a class to be easily writable to XML using this class, the class should implement the interface rasmus_torkel.xml_basic.write.XmlSinkWritable. There are two methods to implement:
public TagNodeId
naturalTagNodeId();
and
public void
mostToXml(XmlSink      xmlSink,
          String       relativeName,
          XmlNameSpace nameSpace);
When implementing mostToXml, we write the object to xmlSink except that we don't call startNode on the node that represents the object and we don't call closeNode on that node either. We don't start the node, because that gives us maximum flexibility with the tag. We don't close the node because objects of subclasses might need to write some extra stuff to the sink. If the invoker has no need for flexibility with the start tag, the start tag comes from the naturalTagNodeId method.

Here is an example of what a class might look like before we add XML capability to it. It is simplified somewhat in that we don't do any error handling.

public class PersonName
{
    public  final String   _firstName;
    private final String[] _middleNames;
    public  final String   _surname;
    
    public
    PersonName(String   firstName,
               String   surname)
    {
        this(firstName, (String[])null, surname);
    }
    
    public
    PersonName(String   firstName,
               String   middleName,
               String   surname)
    {
        this(firstName, new String[]{middleName}, surname);
    }
    
    public
    PersonName(String   firstName,
               String[] middleNames,
               String   surname)
    {
        _firstName = firstName;
        _middleNames = middleNames;
        _surname = surname;
    }
}
To enhance it for writing, we make it implement XmlSinkWritable like this:
public class PersonName implements XmlSinkWritable
Then we satisfy the interface by adding code like this:
public static final TagNodeId NATURAL_TAG_NODE_ID = new TagNodeId("personName");
and
@Override
public TagNodeId naturalTagNodeId()
{
    return NATURAL_TAG_NODE_ID;
}

@Override
public void
mostToXml(XmlSink      xmlSink,
          String       relativeName,
          XmlNameSpace nameSpace)
{
    xmlSink.sinkSimpleNode("firstName", _firstName);
    xmlSink.sinkToStringArrayNode("middleNames", "middleName", _middleNames);
    xmlSink.sinkSimpleNode("surname", _surname);
}

In the code above, sinkToStringArrayNode generates XML simple nodes by calling toString on the array elements.

Here is some sample code for declaring and writing a PersonName:

PersonName name = new PersonName("Alfred", new String[]{"Bernhard", "Carlos"}, "Dreyfuss");
XmlSink xmlSink = new XmlSink();
xmlSink.sinkNode(name);
String xml = xmlSink.toString();
And here is the XML thus produced:
<personName>
  <firstName>Alfred</firstName>
  <middleNames>
    <middleName>Bernhard</middleName>
    <middleName>Carlos</middleName>
  </middleNames>
  <surname>Dreyfuss</surname>
</personName>
As the invocation did not supply any alternative tag information, we ended up with what we specified in the naturalTagNodeId method which is "personName". We could specify our own tag information in various ways. Here is some more sample code:
PersonName name = new PersonName("Alfred", new String[]{"Bernhard", "Carlos"}, "Dreyfuss");
XmlNameSpace nameSpace = new XmlNameSpace("x", "http://rasmustorkel.com/ns1/");
TagNodeId tagNodeId = new TagNodeId("xname", nameSpace);
XmlSink xmlSink = new XmlSink();
xmlSink.sinkNode(tagNodeId, name);
String xml = xmlSink.toString();
And here is the XML:
<x:xname xmlns:x="http://rasmustorkel.com/ns1/">
  <firstName>Alfred</firstName>
  <middleNames>
    <middleName>Bernhard</middleName>
    <middleName>Carlos</middleName>
  </middleNames>
  <surname>Dreyfuss</surname>
</x:xname>
We can easily write arrays as well. Here is some sample code:
PersonName[] personNames =
{
        new PersonName("Alexander", "Armstrong"),
        new PersonName("Bianca", "Bella", "Brown"),
        new PersonName("Christopher", new String[]{"Carlos", "Claus"}, "Cooper"),
};
XmlSink xmlSink = new XmlSink();
xmlSink.sinkArrayNode("personNames", personNames);
String xml = xmlSink.toString();
This leads to the following XML:
<personNames>
  <personName>
    <firstName>Alexander</firstName>
    <surname>Armstrong</surname>
  </personName>
  <personName>
    <firstName>Bianca</firstName>
    <middleNames>
      <middleName>Bella</middleName>
    </middleNames>
    <surname>Brown</surname>
  </personName>
  <personName>
    <firstName>Christopher</firstName>
    <middleNames>
      <middleName>Carlos</middleName>
      <middleName>Claus</middleName>
    </middleNames>
    <surname>Cooper</surname>
  </personName>
</personNames>
In the previous example, we needed to supply tag information for the array node. Tagging information is not required for the elements because we have a natural tag node id to refer to.

15.0.2. Basic object oriented reading from XML

This API contains the abstract generic factory class rasmus_torkel.xml_basic.read.factory.MainXmlObjectFactory. We go back to the PersonName class that we had earlier and we add the following code:
    
public static MainXmlObjectFactory<PersonName> FROM_XML_FACTORY = new MainXmlObjectFactory<PersonName>(NATURAL_TAG_NODE_ID)
{
    @Override
    public PersonName extractFromNode(TagNode node)
    {
        return new PersonName(node);
    }
};
and
public
PersonName(TagNode node)
{
    _firstName = node.nextTextFieldE("firstName");
    _middleNames =
            XmlFactoriesForStandardTypes.STRING_FACTORY.nextArrayFromParentN(
                    node, "middleNames", "middleName", StringUtil.ARRAY_MAKER);
    _surname = node.nextTextFieldE("surname");
}
It is recommended practice, to declare the factory in the class for which it creates objects, to give it the name "FROM_XML_FACTORY" and to make it a public static final. If you do that, it is easier to support inheritance hierarchies when reading XML. Of course, you can't always do that, like when the class comes from elsewhere and you can't change it. The standard String class is a good example and we see an example where we are reading an array of strings using a string factory we declared elsewhere. But more on inheritance later. We also see a reference to SingUtil.ARRAY_MAKER. What that means will become clear when we cover array creation.

One more crucial point. Do not call node.verifyNoMoreChildren on the node being extracted from the extractFromNode factory function or any code invoked by it. verifyNoMoreChildren should not be called twice on the same node and when retrieving a node from the factory, the factory will call it. This design decision facilitates support for inheritance, as we will see a little later.

We are ready to start reading PersonName objects. Previously we generated the XML below:

<personName>
  <firstName>Alfred</firstName>
  <middleNames>
    <middleName>Bernhard</middleName>
    <middleName>Carlos</middleName>
  </middleNames>
  <surname>Dreyfuss</surname>
</personName>
We can regenerate the person object with the code below:
TagNode rootNode =
    XmlReader.xmlStringToRoot(xml, "demo", XmlReadOptions.DEFAULT, PersonName.NATURAL_TAG_NODE_ID);
PersonName name2 = PersonName.FROM_XML_FACTORY.extractFromNode(rootNode);
rootNode.verifyNoMoreChildren();

We will shortly see a much better way of doing this than the above code.

When objects are nested inside other objects, retrieval is a little different and we will see examples of that later.

We also generated XML for an array of PersonName objects earlier. Java Generics do not naturally support the creation of objects of the generic type. So if we want to read arrays of a class, we need to supply a concrete subclass for the abstract generic class rasmus_torkel.misc.ArrayMaker. To do this, we need to implement the following method:

public abstract T[]
newArray(int size);
Here is how we declare an ArrayMaker for PersonName:
public static final ArrayMaker<PersonName> ARRAY_MAKER = new ArrayMaker<PersonName>()
{
    @Override
    public PersonName[]
    newArray(int size)
    {
        return new PersonName[size];
    }
};
Below is the code for regenerating the array we wrote earlier where elements had the natural tag node id:
TagNode rootNode =
        XmlReader.xmlStringToRoot(xml, "demo", XmlReadOptions.DEFAULT, "personNames");
PersonName[] personNames2 =
        PersonName.FROM_XML_FACTORY.extractArrayFromNode(rootNode, PersonName.ARRAY_MAKER);

If you prefer not to create ArrayMaker objects and to work with lists instead, it's no problem. In fact, the above array extraction function reads a list and then makes an array:

public T[]
extractArrayFromNode(TagNode       arrayNode,
                     ArrayMaker<T> arrayMaker)
{
    ArrayList list = extractListFromNode(arrayNode);
    return arrayMaker.fromList(list);
}

16. Object Oriented programming and this API - reading and writing XML files with one line of code

If we have a factory for a class to read it from XML and if the class implements XmlSinkWritable, then we can write an XML file or string for objects of that class in one line of code. And we can also read the string or file and regenerate the object with one line of code.

Or to put it another way, if as a general rule, we equip Java types that need to be written to XML (other than simple types) with factories and have them implement XmlSinkWritable, the actual reading and writing of an XML string or file will always be able to be done in one line of code.

Here is a method each on the factories to read an XML string or an XML file in one line.

public T
stringToObject(String              xmlString,
               String              context,
               XmlReadOptions      options)
    
public T
fileToObject(File                xmlFile,
             XmlReadOptions      options)

Here is a method each on XmlSink to convert an Object to an XML string or object.

public static String
objectToString(XmlSinkWritable object,
               XmlWriteOptions options)

public static void
objectToFile(XmlSinkWritable object,
             File            xmlFile,
             XmlWriteOptions options)

For each of the above four methods, another version exists which does not have an options parameter and uses defaults. Writing and reading an object would look something like this:

PersonName name = new PersonName("Rasmus", "Torkel");
String xmlString = OneLineXml.objectToString(name);
PersonName name2 = PersonName.FROM_XML_FACTORY.stringToObject(xmlString, "demo");

We also have functions for reading arrays of objects. There are more of them because we need to specify tagging information for the array and we can do this in different ways. Let's just see one example of writing and reading an array in one line:

PersonName[] names = new PersonName[]
{
        new PersonName("Alexander", "Armstrong"),
        new PersonName("Bianca", "Bella", "Brown"),
};
String xmlString = XmlSink.arrayToString("array", names);
PersonName[] names2 = PersonName.FROM_XML_FACTORY.stringToArray(xmlString, "demo", PersonName.ARRAY_MAKER, "array");

17. Object Oriented programming and this API - inheritance

If we are representing inheritance hierarchies of objects in XML, we have situations where we have to read a node without knowing beforehand the exact class to instantiate. How do we determine the class to instantiate? The API supports three different approaches, each with its advantages and disadvantages.

17.1. Specifying the class in the attribute classOfObject

This approach uses a special Rasmus Torkel convention which is to specify the class in the attribute classOfObject. So this approach will not work if your XML format has been designed by somebody else who is unaware of this convention. However, if you are designing your XML from scratch, and if you have control over the classes in the inheritance hierarchy, then this approach is the most straight forward. Don't worry about this approach being to Java-centric. Java is not the only language that has packages. And packages can also be mapped to C++ name spaces.

We earlier recommended that the factory be declared in the class for which it creates objects, that it be called FROM_XML_FACTORY and that it be a public static final. For this approach, this is absolutely essential. The API needs to get from the class name to the factory and it relies on this convention and a little bit of reflection trickery. But don't worry about the overhead of invoking reflection. Once factories are looked up, this API caches them.

XmlSink has a special function to write the attribute declaring the class for the object you are writing:

public void
sinkClassOfObjectAttribute(Object object)

Let's see some sample code. Here are the edited classes for employee, manager and ceo and also for some data used by the class for employee. We have already seen PersonName.

public enum Gender
{
    MALE("male"),
    FEMALE("female");
    
    public final String _label;
    
    private
    Gender(String label)
    {
        _label = label;
    }
    
    @Override
    public String
    toString()
    {
        return _label;
    }
}
public class Employee implements XmlSinkWritable
{

    public final long       _id;
    public final long       _bossId;
    public final PersonName _name;
    public final Date2      _birthDate;
    public final Date2      _joinDate;
    public final int        _salary;
    public final Gender     _gender;
    
    public static final TagNodeId NATURAL_TAG_NODE_ID = new TagNodeId("employee");
    public static final TagNodeId NATURAL_ARRAY_TAG_NODE_ID = new TagNodeId("employees");
    
    public static final MainXmlObjectFactory<Employee> FROM_XML_FACTORY = new MainXmlObjectFactory<Employee>(NATURAL_TAG_NODE_ID)
    {
        @Override
        public Employee extractFromNode(TagNode node)
        {
            return new Employee(node);
        }
    };
    
    public static ArrayMaker<Employee> ARRAY_MAKER = new ArrayMaker<Employee>()
    {
        @Override
        public Employee[]
        newArray(int size)
        {
            return new Employee[size];
        }
    };
    
    public
    Employee(TagNode node)
    {
        this(node, true);
    }
    
    protected
    Employee(TagNode node,
             boolean isBossIdExpected)
    {
        _id = node.nextLongFieldE("id");
        _bossId = bossIdFromNode(node, isBossIdExpected);
        _name = PersonName.FROM_XML_FACTORY.nextOnParentE(node, "name");
        _birthDate = YmdHyphenPaddedDateXmlFactory.INSTANCE.nextFieldE(node, "birthDate");
        _joinDate = YmdHyphenPaddedDateXmlFactory.INSTANCE.nextFieldE(node, "joinDate");
        _salary = node.nextIntFieldE("salary");
        _gender = (Gender)node.nextEnumFieldE("gender", Gender.class);
    }
    
    public
    Employee(long       id,
             long       bossId,
             PersonName name,
             Date2      birthDate,
             Date2      joinDate,
             int        salary,
             Gender     gender)
    {
        _id = id;
        _bossId = bossId;
        _name = name;
        _birthDate = birthDate;
        _joinDate = joinDate;
        _salary = salary;
        _gender = gender;
    }
    
    private static long
    bossIdFromNode(TagNode node,
                   boolean isBossIdExpected)
    {
        if (isBossIdExpected)
        {
            return node.nextLongFieldE("bossId");
        }
        else
        {
            return -1;
        }
    }
    
    public String
    toString()
    {
        return _id + "/" + _name;
    }

    @Override
    public TagNodeId naturalTagNodeId()
    {
        return NATURAL_TAG_NODE_ID;
    }
    
    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        xmlSink.sinkClassOfObjectAttribute(this);
        xmlSink.sinkSimpleNode("id", _id);
        if (_bossId != -1)
        {
            xmlSink.sinkSimpleNode("bossId", _bossId);
        }
        xmlSink.sinkNode("name", _name);
        xmlSink.sinkSimpleNode("birthDate", _birthDate);
        xmlSink.sinkSimpleNode("joinDate", _joinDate);
        xmlSink.sinkSimpleNode("salary", _salary);
        xmlSink.sinkSimpleNode("gender", _gender);
    }
}
public class Manager extends Employee
{
    public final Date2 _managementStartDate;
    
    public static final TagNodeId NATURAL_TAG_NODE_ID = new TagNodeId("manager");
    
    public static final MainXmlObjectFactory<Manager> FROM_XML_FACTORY = new MainXmlObjectFactory<Manager>(NATURAL_TAG_NODE_ID)
    {
        @Override
        public Manager extractFromNode(TagNode node)
        {
            return new Manager(node);
        }
    };
    
    public
    Manager(TagNode node)
    {
        this(node, true);
    }
    
    protected
    Manager(TagNode node,
            boolean isBossIdExpected)
    {
        super(node,
              isBossIdExpected);
        _managementStartDate = YmdHyphenPaddedDateXmlFactory.INSTANCE.nextFieldE(node, "managementStartDate");
    }
    
    public
    Manager(long       id,
            long       bossId,
            PersonName name,
            Date2      birthDate,
            Date2      joinDate,
            int        salary,
            Gender     gender,
            Date2      managementStartDate)
    {
        super(id, bossId, name, birthDate, joinDate, salary, gender);
        _managementStartDate = managementStartDate;
    }

    @Override
    public TagNodeId
    naturalTagNodeId()
    {
        return NATURAL_TAG_NODE_ID;
    }
    
    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        super.mostToXml(xmlSink, relativeName, nameSpace);
        xmlSink.sinkSimpleNode("managementStartDate", _managementStartDate);
    }
}
public class Ceo extends Manager
{
    public final Date2 _ceoStartDate;
    
    public static final TagNodeId NATURAL_TAG_NODE_ID = new TagNodeId("ceo");
    
    public static final MainXmlObjectFactory<Ceo> FROM_XML_FACTORY = new MainXmlObjectFactory<Ceo>(NATURAL_TAG_NODE_ID)
    {
        @Override
        public Ceo extractFromNode(TagNode node)
        {
            return new Ceo(node);
        }
    };
    
    public
    Ceo(TagNode node)
    {
        super(node, false);
        _ceoStartDate = YmdHyphenPaddedDateXmlFactory.INSTANCE.nextFieldE(node, "ceoStartDate");
    }
    
    public
    Ceo(long       id,
        PersonName name,
        Date2      birthDate,
        Date2      joinDate,
        int        salary,
        Gender     gender,
        Date2      managementStartDate,
        Date2      ceoStartDate)
    {
        super(id, -1, name, birthDate, joinDate, salary, gender, managementStartDate);
        _ceoStartDate = ceoStartDate;
    }

    @Override
    public TagNodeId naturalTagNodeId()
    {
        return NATURAL_TAG_NODE_ID;
    }

    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        super.mostToXml(xmlSink, relativeName, nameSpace);
        xmlSink.sinkSimpleNode("ceoStartDate", _ceoStartDate);
    }
}
public class Company implements XmlSinkWritable
{
    public final String     _name;
    public final Employee[] _employees;
    
    public static final TagNodeId NATURAL_TAG_NODE_ID = new TagNodeId("company");
    public static final MainXmlObjectFactory<Company> FROM_XML_FACTORY = new MainXmlObjectFactory<Company>(NATURAL_TAG_NODE_ID)
    {
        @Override
        public Company
        extractFromNode(TagNode node)
        {
            return new Company(node);
        }
    };
    
    public
    Company(TagNode node)
    {
        _name = node.nextTextFieldE("name");
        _employees = Employee.FROM_XML_FACTORY.nextArrayFromParentE(node, Employee.NATURAL_ARRAY_TAG_NODE_ID, Employee.ARRAY_MAKER);
    }
    
    public
    Company(String     name,
            Employee[] employees)
    {
        
        _name = name;
        _employees = employees;
    }

    @Override
    public TagNodeId
    naturalTagNodeId()
    {
        return NATURAL_TAG_NODE_ID;
    }

    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        xmlSink.sinkClassOfObjectAttribute(this);
        xmlSink.sinkSimpleNode("name", _name);
        xmlSink.sinkArrayNode(Employee.NATURAL_ARRAY_TAG_NODE_ID, _employees);
    }
}
A few comments on the above code with respect to inheritance. We can now see the benefits of not having the constructors call verifyNoMoreChildren. The employee constructor is (sometimes) called by the Manager constructor which expects a few more fields.

Similarly we can now see why mostToXml must not close the node representing the object being written. In our example, the subclasses Manager and Ceo write additional attributes to what Employee writes.

Let's declare some objects:

Ceo ceo = new Ceo(
        1001,
        new PersonName("Alexander", "Armstrong"),
        new Date2(1961, 1, 11, YmdHyphenPaddedDateFactory.INSTANCE),
        new Date2(1991, 7, 12, YmdHyphenPaddedDateFactory.INSTANCE),
        105000,
        Gender.MALE,
        new Date2(1991, 7, 12, YmdHyphenPaddedDateFactory.INSTANCE),
        new Date2(1995, 7, 13, YmdHyphenPaddedDateFactory.INSTANCE));
Manager manager1 = new Manager(
        1203,
        ceo._id,
        new PersonName("Bianca", "Bella", "Brown"),
        new Date2(1966, 2, 21, YmdHyphenPaddedDateFactory.INSTANCE),
        new Date2(1993, 8, 23, YmdHyphenPaddedDateFactory.INSTANCE),
        80000,
        Gender.FEMALE,
        new Date2(1993, 8, 23, YmdHyphenPaddedDateFactory.INSTANCE));
Employee employee11 = new Employee(
        1203,
        manager1._id,
        new PersonName("Christopher", new String[]{"Carlos", "Claus"}, "Cooper"),
        new Date2(1972, 4, 4, YmdHyphenPaddedDateFactory.INSTANCE),
        new Date2(1997, 12, 1, YmdHyphenPaddedDateFactory.INSTANCE),
        100000,
        Gender.MALE);
Employee employee12 = new Employee(
        1204,
        manager1._id,
        new PersonName("Dora", "Daisy", "Davidson"),
        new Date2(1971, 3, 26, YmdHyphenPaddedDateFactory.INSTANCE),
        new Date2(1995, 9, 27, YmdHyphenPaddedDateFactory.INSTANCE),
        100000,
        Gender.MALE);
Employee[] employees = new Employee[]
{
        ceo,
        manager1,
        employee11,
        employee12
};
Company company = new Company("ABC Software", employees);
We can convert our company object to XML like this:
String xmlString = XmlSink.objectToString(company);
Then we end up with XML like this:
<company classOfObject="rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.Company">
  <name>ABC Software</name>
  <employees>
    <ceo classOfObject="rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.Ceo">
      <id>1001</id>
      <name>
        <firstName>Alexander</firstName>
        <surname>Armstrong</surname>
      </name>
      <birthDate>1961/01/11</birthDate>
      <joinDate>1991/07/12</joinDate>
      <salary>105000</salary>
      <gender>male</gender>
      <managementStartDate>1991/07/12</managementStartDate>
      <ceoStartDate>1995/07/13</ceoStartDate>
    </ceo>
    <manager classOfObject="rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.Manager">
      <id>1203</id>
      <bossId>1001</bossId>
      <name>
        <firstName>Bianca</firstName>
        <middleNames>
          <middleName>Bella</middleName>
        </middleNames>
        <surname>Brown</surname>
      </name>
      <birthDate>1966/02/21</birthDate>
      <joinDate>1993/08/23</joinDate>
      <salary>80000</salary>
      <gender>female</gender>
      <managementStartDate>1993/08/23</managementStartDate>
    </manager>
    <employee classOfObject="rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.Employee">
      <id>1203</id>
      <bossId>1203</bossId>
      <name>
        <firstName>Christopher</firstName>
        <middleNames>
          <middleName>Carlos</middleName>
          <middleName>Claus</middleName>
        </middleNames>
        <surname>Cooper</surname>
      </name>
      <birthDate>1972/04/04</birthDate>
      <joinDate>1997/12/01</joinDate>
      <salary>100000</salary>
      <gender>male</gender>
    </employee>
    <employee classOfObject="rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.Employee">
      <id>1204</id>
      <bossId>1203</bossId>
      <name>
        <firstName>Dora</firstName>
        <middleNames>
          <middleName>Daisy</middleName>
        </middleNames>
        <surname>Davidson</surname>
      </name>
      <birthDate>1971/03/26</birthDate>
      <joinDate>1995/09/27</joinDate>
      <salary>100000</salary>
      <gender>male</gender>
    </employee>
  </employees>
</company>
We can then regenerate a company object which is identical to the one we started with using code like this:
Company company2 = Company.FROM_XML_FACTORY.stringToObject(xmlString, "demo");

Whenever we declare the class according to convention, the factory will not refer to the tag node identifier to determine the exact class.

In the example below, the classOfObject attribute is also written for Company which has no subclasses. This isn't really essential. But if there any possibility of of having to accomodate subclasses later, and it is not a simple type, then it is probably a good idea. For the PersonName class, the classOfObject was not written because the class is recycled from a pre-inheritance example.

17.1.1. Exceptions

Here are examples of exception that can occur when there are problems with factories:
rasmus_torkel.xml_basic.read.exception.XmlTypedNodeException: unit test: employee, starting at line 1, char 12 specifies class rasmus_torkel.test.xml_basic.oo.PersonName in attribute classOfObject but the class is not a subclass of the supported type for this factory which is rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.Employee
rasmus_torkel.xml_basic.read.exception.XmlTypedNodeException: unit test: employee, starting at line 1, char 12 specifies class john_smith.employee.Employee in attribute classOfObject but the class could not be loaded: java.lang.ClassNotFoundException: john_smith.employee.Employee
rasmus_torkel.xml_basic.read.exception.XmlTypedNodeException: unit test: employee, starting at line 1, char 12 specifies class rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.Gender in attribute classOfObject but the class does not have a FROM_XML_FACTORY field
rasmus_torkel.xml_basic.read.exception.XmlTypedNodeException: unit test: employee, starting at line 1, char 12 specifies class rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.ClassWithWrongTypeFactory in attribute classOfObject but class of FROM_XML_FACTORY is java.lang.Integer which not a subclass of rasmus_torkel.xml_basic.read.factory.MainXmlObjectFactory
rasmus_torkel.xml_basic.read.exception.XmlTypedNodeException: unit test: employee, starting at line 1, char 12 specifies class rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.ClassWithNonStaticFactory in attribute classOfObject but the FROM_XML_FACTORY field of the class is not static
rasmus_torkel.xml_basic.read.exception.XmlTypedNodeException: unit test: employee, starting at line 1, char 12 specifies class rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.ClassWithNullFactory in attribute classOfObject but the value of the FROM_XML_FACTORY field of the class is null
rasmus_torkel.xml_basic.read.exception.XmlTypedNodeException: unit test: employee, starting at line 1, char 12 specifies class rasmus_torkel.test.xml_basic.oo.inheritance.class_of_object.ClassWithPrivateFactory in attribute classOfObject but the class could not be loaded: java.lang.RuntimeException: Failed to ascertain runtime class to be supported by factory of class rasmus_torkel.xml_basic.read.factory.MainXmlObjectFactory

17.1.2. Abstract class at the root of inheritance hierarchy

If the class at the root of an inheritance factory is abstract, you still need a factory which serves as a starting point and from which the correct subclass factory will be found, but the factory obviously can't instantiate objects of that exact type. In such a situation, the implementation of extractFromNode should throw an exception. This is best done like this:
public static final MainXmlObjectFactory<A> FROM_XML_FACTORY = new MainXmlObjectFactory<A>((TagNodeId)null)
{
    @Override
    public A
    extractFromNode(TagNode node)
    {
        throw makeExtractAbstractException();
    }
};

17.2. Inheritance by natural tag node id

Another mechanism is to rely on the tag node id of the node to identify the subclasses. This approach relies on the natural tag node identifier to be unique within an inheritace hierarchy. The initial factory at the root of the inheritance hierarchy traverses the factories of the subclasses to match the natural tag node identifier to the tag node identifier of the tag node representing the object.

But there is a problem: There is no way of obtaining the subclasses of a class at runtime. That makes sense, because Java only loads classes on demand. The way this API works around this problem is as follows: When a MainXmlObjectFactory is constructed, it tries to find the MainXmlObjectFactory of its immediate superclass and register itself with it. How does a factory find the factory of its superclass? Using reflection trickery and relying on the convention that the MainXmlObjectFactory is declared as a public static final within the class for which it is creating objects and with the name FROM_XML_FACTORY.

Also, if the classes in an inheritance hierarchy aren't loaded already, you need to force them to load before you start reading XML involving those classes. One way to do this is to call the forceLoad method on the factory:

MainXmlObjectFactory.forceLoad(Aa.class);
So let's declare a few classes to read and write. We can't reuse the employee classes that we used previously because they write themselves with the classOfObject attribute and the presence of that attribute would trigger the loading by class name while we want to see what happes when we don't have the classOfObject attribute. So we need to declare brand new classes.
public abstract class A implements XmlSinkWritable
{
    public static final MainXmlObjectFactory<A> FROM_XML_FACTORY = new MainXmlObjectFactory<A>((TagNodeId)null)
    {
        @Override
        public A
        extractFromNode(TagNode node)
        {
            throw makeExtractAbstractException();
        }
    };
    
    public static final ArrayMaker<A> ARRAY_MAKER = new ArrayMaker<A>()
    {
        @Override
        public A[]
        newArray(int size)
        {
            return new A[size];
        }
    };
    
    public final int _aId;
    
    public
    A(TagNode node)
    {
        _aId = node.nextIntFieldE("aId");
    }
    
    public
    A(int aId)
    {
        _aId = aId;
    }

    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        xmlSink.sinkSimpleNode("aId", _aId);
    }
}
public class Aa extends A
{
    public static final TagNodeId NATURAL_TAG_NODE_ID = new TagNodeId("aa");
    public static final MainXmlObjectFactory<Aa> FROM_XML_FACTORY = new MainXmlObjectFactory<Aa>(NATURAL_TAG_NODE_ID)
    {
        @Override
        public Aa
        extractFromNode(TagNode node)
        {
            return new Aa(node);
        }
    };
    
    public final int _aaId;
    
    public
    Aa(TagNode node)
    {
        super(node);
        _aaId = node.nextIntFieldE("aaId");
    }

    public
    Aa(int aId,
       int aaId)
    {
        super(aId);
        _aaId = aaId;
    }

    @Override
    public TagNodeId
    naturalTagNodeId()
    {
        return NATURAL_TAG_NODE_ID;
    }

    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        super.mostToXml(xmlSink, relativeName, nameSpace);
        xmlSink.sinkSimpleNode("aaId", _aaId);
    }
}
public class Aaa extends Aa
{
    public static final TagNodeId NATURAL_TAG_NODE_ID = new TagNodeId("aaa");
    public static final MainXmlObjectFactory<Aaa> FROM_XML_FACTORY = new MainXmlObjectFactory<Aaa>(NATURAL_TAG_NODE_ID)
    {
        @Override
        public Aaa
        extractFromNode(TagNode node)
        {
            return new Aaa(node);
        }
    };
    
    public final int _aaaId;
    
    public
    Aaa(TagNode node)
    {
        super(node);
        _aaaId = node.nextIntFieldE("aaaId");
    }

    public
    Aaa(int aId,
        int aaId,
        int aaaId)
    {
        super(aId, aaId);
        _aaaId = aaaId;
    }

    @Override
    public TagNodeId
    naturalTagNodeId()
    {
        return NATURAL_TAG_NODE_ID;
    }

    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        super.mostToXml(xmlSink, relativeName, nameSpace);
        xmlSink.sinkSimpleNode("aaaId", _aaaId);
    }
}
We have two more subclasses of A, called Ab and Ac which are analogous to Aa. We also have a class B which is analogous to the class for A, except that it is not abstract.

So we declare some objects:

A[] as = new A[]{new Aa(1001, 11001), new Ab(1002, 12002), new Aaa(1003, 11003, 11103), new Ac(1004, 13004)};
And this is how we write them:
String xmlString = XmlSink.arrayToString("as", as);
In the above code, we did not specify any tagging information for the elements. That's critical to this approach. We need the natural tag node id for each element because we are identifying the exact subclass by the tag node id. Below is the XML we just generated:
<as>
  <aa>
    <aId>1001</aId>
    <aaId>11001</aaId>
  </aa>
  <ab>
    <aId>1002</aId>
    <abId>12002</abId>
  </ab>
  <aaa>
    <aId>1003</aId>
    <aaId>11003</aaId>
    <aaaId>11103</aaaId>
  </aaa>
  <ac>
    <aId>1004</aId>
    <acId>13004</acId>
  </ac>
</as>
We can regenerate the objects using the code below:
A[] as2 = A.FROM_XML_FACTORY.stringToArray(xmlString, "demo", A.ARRAY_MAKER, "as");

17.2.1. Exceptions

For the inheritance by natural tag node id approach to work, we need to have exactly one loaded class in the inheritance hierarchy which whose natural tag node id matches the encountered tag node id. If we don't find one, we will see an exception such as this one:
Got exception as expected: rasmus_torkel.xml_basic.read.exception.XmlWrongNodeException: demo: tag node id of b, starting at line 2, char 3 does not match natural tag node id for any loaded subclass of rasmus_torkel.test.xml_basic.oo.inheritance.A

If we find multiple matching subclasses, we will see an exception such as this one:

Got exception as expected: java.lang.RuntimeException: Trying to find MainXmlObjectFactory in hierarchy for rasmus_torkel.test.xml_basic.oo.inheritance.A where naturalId has relativeName aa and nameSpace null but there are multiple matching classes: rasmus_torkel.test.xml_basic.oo.inheritance.Aa, rasmus_torkel.test.xml_basic.oo.inheritance.AaExtra

17.2.2. Debugging the inheritance hierarchy

This API doesn't do any kind of logging because it is intended to work in all sorts of environments. However. MainXmlObjectFactory does generate inheritance reports which you can log to whatever logger you work with on your project.

One kind of report that we support is branch report. That means from the initial class down the inheritance hierarchy. Here is how we get a report for class A.

String report = A.FROM_XML_FACTORY.makeInheritanceBranchReport();

If we call the above code before loading anything else, we will get a report like this:

Inheritance branch report
  rasmus_torkel.test.xml_basic.oo.inheritance.A, abstract, root: super class is java.lang.Object
We don't see any of the subclasses. Let's force the load of class Aaa:
MainXmlObjectFactory.forceLoad(Aaa.class);

If we generate another inheritance branch report of class A, we get this:

Inheritance branch report
  rasmus_torkel.test.xml_basic.oo.inheritance.A, abstract, root: super class is java.lang.Object
    rasmus_torkel.test.xml_basic.oo.inheritance.Aa, relativeName = aa
      rasmus_torkel.test.xml_basic.oo.inheritance.Aaa, relativeName = aaa

We see that class Aa was also loaded. That is a consequence of the factory for Aaa registering itself with the factory for Aa. Let's generate a report for a whole bunch of classes like this:

Class[] classes = {Aaa.class, Ab.class, Ac.class, B.class, String.class};
String report = MainXmlObjectFactory.makeInheritanceReport(classes);

We have the String class in there as well to see what happens when we include classes that don't follow our convention. This is the report:

Inheritance report for class array
  rasmus_torkel.test.xml_basic.oo.inheritance.A, abstract, root: super class is java.lang.Object
    rasmus_torkel.test.xml_basic.oo.inheritance.Aa, relativeName = aa
      rasmus_torkel.test.xml_basic.oo.inheritance.Aaa, relativeName = aaa
    rasmus_torkel.test.xml_basic.oo.inheritance.Ab, relativeName = ab
    rasmus_torkel.test.xml_basic.oo.inheritance.Ac, relativeName = ac
  rasmus_torkel.test.xml_basic.oo.inheritance.B, relativeName = b, root: super class is java.lang.Object
  java.lang.String, no factory: java.lang.String does not have a FROM_XML_FACTORY field

What happened here is that the API found all the roots for the various classes specified and displayed the inheritance hierarchies for those roots. For the String class which does not follow our convention, we see a sensible line telling us the reason. The API also makes sure that all the classes that were specified are actually loaded. So if we now make another branch report for class A, we get this:

Inheritance branch report
  rasmus_torkel.test.xml_basic.oo.inheritance.A, abstract, root: super class is java.lang.Object
    rasmus_torkel.test.xml_basic.oo.inheritance.Aa, relativeName = aa
      rasmus_torkel.test.xml_basic.oo.inheritance.Aaa, relativeName = aaa
    rasmus_torkel.test.xml_basic.oo.inheritance.Ab, relativeName = ab
    rasmus_torkel.test.xml_basic.oo.inheritance.Ac, relativeName = ac

17.3. XSD choice style inheritance

This style of inheritance is inspired by how choices work in XML Schema Definitions. It does not use any sort of reflection but it does require you to be more explicit when setting it up. The idea is that you supply an array of options which map a tag node if to a factory for creating an object. You also specify the class which is at the root of the inheritance hierarchy. One extra thing you can do with this approach is to have multiple options for the same type but with different tag node identifiers and possibly different factories.

For this, we are going to reuse the same data classes that we used for natural tag node inheritance plus one additional one:

public class X implements XmlSinkWritable
{
    public static XmlObjectFactory<X> FROM_XML_FACTORY = new XmlObjectFactory<X>(X.class)
    {
        @Override
        public X extractFromNode(TagNode node)
        {
            return new X(node);
        }
    };
    
    public final String    _role;
    public final int       _xId;
    public final TagNodeId _tagNodeId;
    
    public
    X(TagNode node)
    {
        _role = node._id._relativeName;
        _xId = node.nextIntFieldE("xId");
        _tagNodeId = node._id;
    }
    
    public
    X(String role,
      int    xId)
    {
        _role = role;
        _xId = xId;
        _tagNodeId = new TagNodeId(_role);
    }

    @Override
    public TagNodeId
    naturalTagNodeId()
    {
        return _tagNodeId;
    }

    @Override
    public void
    mostToXml(XmlSink      xmlSink,
              String       relativeName,
              XmlNameSpace nameSpace)
    {
        xmlSink.sinkSimpleNode("xId", _xId);
    }
}

The main difference between class X and the other sample classes is that it does not have a constant natural tag node id but has one which is related to its role field. Let's declare some data and a factory:

Aa aa1 = new Aa(1001, 11001);
B  b1  = new B(2005);
X  x1  = new X("xenia", 9005);
X  x2  = new X("xerxes", 9006);
XmlSinkWritable[] objects = new XmlSinkWritable[]{aa1, b1, x1, x2};
ChoiceXmlObjectFactory.Option optionAa = new ChoiceXmlObjectFactory.Option(Aa.FROM_XML_FACTORY);
ChoiceXmlObjectFactory.Option optionB = new ChoiceXmlObjectFactory.Option(B.FROM_XML_FACTORY);
ChoiceXmlObjectFactory.Option optionX1 = new ChoiceXmlObjectFactory.Option(x1._tagNodeId, X.FROM_XML_FACTORY);
ChoiceXmlObjectFactory.Option optionX2 = new ChoiceXmlObjectFactory.Option(x2._tagNodeId, X.FROM_XML_FACTORY);
ChoiceXmlObjectFactory<XmlSinkWritable> factory =
        new ChoiceXmlObjectFactory<XmlSinkWritable>(
                XmlSinkWritable.class,
                new ChoiceXmlObjectFactory.Option[]{optionAa, optionB, optionX1, optionX2});

We have two options for X, each with it own tag node identifier ("xenia" and "xerxes"). For the Aa and B options, there was no need to specify an identifier, because the Option inner class has a special constructor for MainXmlObjectFactory which uses the _naturalId field of the factory. We could have specified a different identifier if we had wanted to. The first parameter in the ChoiceXmlObjectFactory constructor is the generic type at the root of the inheritance hierarchy for the factory. When we write our objects the usual way, we get XML like this:

<elements>
  <aa>
    <aId>1001</aId>
    <aaId>11001</aaId>
  </aa>
  <b>
    <bId>2005</bId>
  </b>
  <xenia>
    <xId>9005</xId>
  </xenia>
  <xerxes>
    <xId>9006</xId>
  </xerxes>
</elements>

We regenerate our objects the pretty much the usual way:

XmlSinkWritable[] objects2 = factory.stringToArray(xmlString, "demo", XmlSinkWritable.ARRAY_MAKER, "elements");
The main difference to the other inheritance approaches is that this approach is very explicit in how we define inheritance. So we don't have to follow any convention for declaring the factory and in this case, our factory was just a local variable. Even the factories in the options don't have to follow the convention although they did for our example.

17.4. Disabling inheritance

The first factory class we saw in our discussion on object oriented reading was MainXmlObjectFactory. But that is actually a subclass of XmlObjectFactory which is the base class. MainXmlObjectFactory adds support for inheritance that is required both for the classOfObject approach and for tag id inheritance approach (but not for XSD choice style inheritance). So if you want a factory that creates objects of one exact class, extend XmlObjectFactory. Your factory needs to implement the same extractFromNode class as factories that extend MainXmlObjectFactory.

18. XML Heading ignored

XML files often have a heading line like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
This API completely ignores such headings. The reasons are as follows: