XPB4J User Guide
Version 0.8
Author: Pankaj Kumar
e-mail: pankaj_kumar@acm.org
Date: October 3 2001
Introduction
XML Processing Benchmark for Java (XPB4J) is a Java based performance measurement and comparison program for
XML processing software. XML operations such as parsing, transformation, validation,
encryption/decryption, custom access/manipulation or any combination of these applied on one or more
XML files and/or byte streams is considered as XML processing.
Specific examples of such processing include:
- validation of an XML file against a specified XML schema file;
- creation/verification of digital signature as per XML Digital Signature standard;
- validation of XML content as per a given set of business rules;
- merging two or more XML files as per a specified set of rules using XSLT stylesheet or otherwise;
- creating memory objects from XML content and vice-versa as per specified data binding rules;
XPB4J doesn't define any benchmark standard; it simply defines a framework to execute and measure
performance characteristics of Processing Activities. It also includes code to do specific
processing. If the same operation can be performed
with different Processing Methods ( say, using different parsing APIs such as SAX, DOM, JDOM or
Pull Parser API) then the performance charateristics of these can be measured and compared. One could
also use different parsers and/or transformers and compare the results for the same processing method.
I wrote XPB4J primarily to
- learn about different XML processing APIs;
- enable myself and my fellow programmers to experiment with different ways of doing the same
processing and understand the trade-offs and hence help us make better design and deployment choices;
- track evolution of XML processing software with respect to their performance characteristics.
I have exercised XPB4J for a specific processing activity which I call XStat Processing. This
processing essentially gathers certain statistical information from the input XML document. The different
processing methods used are:
- SAX -- linear scan using a JAXP compliant SAX parser;
- DOM -- building W3C DOM structure using a JAXP compliant DOM parser;
- JDOM -- building JDOM structure using JDOM software;
- PULL -- linear scan using a Pull Parser;
- XSLT -- using Java extension functions in an XSL stylesheet and using an XSLT transformer; and
- COCOON -- using a Cocoon transformer in Cocoon Processing Framework.
You can find my observations and conclusions under section
XStat Measurements. You could also run XPBJ4 on your machine with your
favourite parser/transformer with your typical input and observe the results.
If your interest is in finding out performance and memory usage of your own custom processing, you can
write your own classes using XPB4J Framework to invoke your processing and
collect the relevant data.
Rest of the guide is organized under following sections:
Note: The directory path and execution script name in this document use the MS WINDOWS convention. Their
UNIX equivalents can be derived simply by replace \ by / in path names and .bat
by .sh in script names.
Note: This version of XPB4J contains script files for MS WINDOWS platform only.
Download XPB4J distribution file xpb4j-0.8.zip from
http://www.pankaj-k.net/xpb4j and unzip it in a suitable
directory. This should create directory xpb4j-0.8, also referred to as the
base directory, and place all the binaries, scripts, documents and sources at appropriate places.
The distribution includes following third party jar files in subdirectory xpb4j-0.8\lib:
- crimson.jar -- Crimson1.1.2beta2. A JAXP compliant parser.
- jdom.jar -- JDOM beta7.
- PullParser2_0_2.jar -- Pull Parser 2.0.2.
Presence of these jar files will allow "out of box" execution of XPB4J for XStat Processing ( except for
processing method XSLT and COCOON ). To try out XPB4J, go to directory xpb4j-0.8,
ensure that environment variable JAVA_HOME set to JDK installation directory and issue the command:
>run
This should report the performance and memory usage measurements on your machine.
To use other parsers, Cocoon2 and/or an XSLT transformer or newer versions of supplied parsers, download
them from their
respective sites and place the corresponding .jar files in xpb4j-0.8\lib directory:
You can also try XPB4J with other JAXP compliant parsers.
As illustrated in earlier section, running XPB4J with default arguments is very simple.
You can change the execution arguments by editing an XML file.
The execution arguments are specified in args.xml file ( located in the base directory ).
You can specify following in this file:
- Attribute loopCount of element Params -- Specifies number of times
each processing be executed. The default value is 10.
- Element Targets -- A list of targets ( string values ) to be passed to the
processing code. The default list contains only one target: Data\rxgen.xml.
- Element ProcessingActivities -- A list of ProcessingActivity elements.
ProcessingActivity names are specified in configuration
file conf.xml ( located in the base directory ).
- Element ProcessingMethods -- A list of ProcessingMethod elements.
ProcessingMethod names and corresponding Java classes are specified in configuration
file conf.xml ( located in the base directory ).
- Attribute enabled of element ProcessingMethod -- A true value
indicates that the corresponding class be loaded and invoked.
- Attribute gc of element Flags -- a true value forces garbage collection before
initiating the processing but outside the measurement window.
- Attribute gcMeasured of element Flags -- a true value forces garbage
collection within the measurement window.
Here is a sample args.xml and conf.xml.
Note the relationship between args.xml and conf.xml file. File conf.xml
contains all the available processing activities and methods. File args.xml
contains information required for a specific execution.
You can add more processing
activities and processing methods by simply writing classes as per XPB4J Framework and adding appropriate
entries into the conf.xml file. Before execution, however, you should ensure that all the
classes are accessible as per the current CLASSPATH.
A successful execution of XPB4J writes the measurements in file pdata.xml and processing results
in file results.xml, both located in the base directory.
To enable and run XPB4J for Cocoon processing, you must have Cocoon installed and you must
build XPB4J with Cocoon.
Set environment variable C2_LIB to the directory having the cocoon*.jar and all other required jar files and
then run the XPB4J's execution script run.batfrom the base directory.
Generating Input Files
XPB4J includes a simple Java program, RandXMLGen.java, to generate arbitrary sized random XML
documents. This program generates XML elements and attributes by picking them from a given set randomly.
The size of the generated file is determined by a numeric argument to the program specifying the number
of children of the topmost element, RXGenTopElement. Look at the source file
org\xperf\xpb\RandXMLGen.java ( included in the
distribution ) to understand how the input file is generated.
To run this program with argument value 100, go to the base directory and issue the command:
>rxgen 100
This generates the file Data\rxgen.xml. Note that the Java program writes the output on standard
output but the rxgen.bat script redirects it to Data\rxgen.xml.
XStat processing consist of scanning one or more XML files and collecting following
statistical information:
- the number times the element occurred
- the number of times it had a particular element as a parent
- the number of times it had a particular element as a child
- the number of times it had a particular attribute
- the amount of character data that had at least some non-whitespace characters
- whether the element was always empty
Acknowledgement: I have borrowed the idea behind this processing from the article
Using The Perl XML::Parser Module.
XPB4J includes code to perform XStat processing using following Processing Methods:
- SAX -- The XML input is accessed using a SAX API and relevant information is stored in
a suitable datastructure. The SAX parser is accessed using JAXP API.
Refer to sources under package org.xperf.xpb.xstat.sax for details.
- DOM -- The XML input is converted into a W3C DOM object and is traversed to gather the
relevant statistics. The DOM parser is accessed using JAXP API.
Refer to sources under package org.xperf.xpb.xstat.dom for details.
- PULL -- The XML input is accessed using Pull Parser API avaialable at
Pull Parser site and relevant information
is stored in a suitable datastructure, as with SAX processing method.
Refer to sources under package org.xperf.xpb.xstat.pull for details.
- JDOM -- The XML input is converted into JDOM document and is traversed to gather the
relevant statistics. This is very similar to DOM processing method.
Refer to sources under package org.xperf.xpb.xstat.jdom for details.
- XSLT -- A stylesheet with Java extension functions is applied to the XML input using an
XSL tranformer. The stylesheet fires appropriate Java function on encountering element nodes,
attributes and text nodes. The transformer is obtained and invoked using JAXP API.
Refer to sources under package org.xperf.xpb.xstat.xslt for details.
- COCOON -- A Cocoon file generator is used to generate SAX events corresponding to
input file and a Cocoon transformer gathers the relevant statistics. This transformer eats
the SAX events and doesn't pass them to the serializer. Cocoon is invoked in commandline mode
and not as a servlet.
Refer to sources under package org.xperf.xpb.xstat.cocoon for details.
Here is a set of
Measurements and
Conclusions on XStat Processing.
To build XPB4J, carry out following steps:
- If you do not have jakarta-ant, Install it now. You can get it from
Apache Jakarta site.
- If you do not have xalan*.jar, Install xalan. You can get it from
Apache XML site.
- Set environment variable ANT_HOME to point to Ant's installation directory.
- Copy xalan*.jar to the xpb4j-0.8\lib directory.
Now run XPB4J's build script in the base directory:
>.\build
To build XPB4J with Cocoon, you must have Cocoon installed. The current version of XPB4J is tested with only
Cocoon2 beta1. You can get Cocoon2 beta1 from Apache XML site.
Set environment variable C2_LIB to the directory having the cocoon*.jar and all other required jar files and
then run the XPB4J's build script from the base directory. To be able to run Cocoon processing method, you
must compile XPB4J with Cocoon.
TBD.
Here is a partial list of known issues/limitations:
- Build and Execution scritps for Linux/UNIX are not present. It should be fairly simple to
write these scripts. If you do, please share with me and I will include those.
- XSL stylesheet for XSLT processing method for XStat uses XALAN processor specific extensions
and may not be portable to other processors. It should be simple to write stylesheets for other
processors. If you do, please share with me and I will include those.
- It is currently not possible to compare measurements using two different parsers using the same API
in one execution run. You can get around this by running XPB4J multiple time, changing CLASSPATH for
each run so that the appropriate parser is used.
- It is not possible to mix Cocoon processing with processing using other methods due to CLASSPATH
conflicts.
- Javadocs for XPB4J Framework do not exist. However, the framework itself is quite simple and if
you really want to use it for your own processing code, you should be able to do so by looking at
the code and following the XStat as a sample.
- The Cocoon processing code for XStat doesn't work with Cocoon2 beta2.
I plan to evolve XPB4J in following areas
- More processing operations: XML schema based validation, data binding,
XML encryption/decryption, XML Digital Signature etc.
- More options to invoke Cocoon based processing.
- Generator of random XML files as per a specified schema and other desired characteristics.
- Collating performance data from multiple execution runs for presentation.
- GUI to specify input runs and other parameters.
- Ability to use multiple parsers with same interface ( say xerces and crimson ) in the same
execution cycle and compare the results.
- Stylesheets to transform the result and performance files for better presentation.
- Concurrent execution of operations to measure concurrency support on multi-cpu machines.
- Support parsers/transformers written in languages other than Java.
------------------