Processing JSON and XML with JavaScript in DocBridge Mill

[JSON = JavaScript Object Notation; E4X = ECMAScript for XML]

Since release 2010.09 of DocBridge Mill, our internal JavaScript engine, SpiderMonkey, has been upgraded to include TraceMonkey, a special optimization technique based on "trace trees" for "just-in-time" compilation. This is exactly the same engine that is used in the latest releases of the Firefox web browser.

This optimized engine can offer dramatic performance improvements in some cases. Another benefit of the update is the availability to DocBridge Mill developers of new JavaScript primitives that have been added to the ECMAScript standard in recent years.

In this article we present two such JavaScript native objects, namely JSON and XML.

The JSON Object

JSON stands for JavaScript Object Notation. When creating a new JavaScript object in literal notation, as in the following example, one is effectively writing JSON.

var obj = {
  propertyOne : "A",
  propertyTwo : "B",
  some_Array : [ 'A', 'B', 'C', 'D' ],
  getProperties : function() {
    return [ this.propertyOne, this.propertyTwo]
  }
};

Since it is quite compact and legible compared to XML, JSON has become increasingly popular as a serialization and data interchange format.

The new JavaScript primitive object JSON is part of the ECMAScript 5 standard (described in ECMA-262). It comes with two methods: JSON.stringify and JSON.parse. Note that this object is a singleton, which simply means that it can be applied directly without the need to instantiate it with "new".

JSON.stringify(obj) allows to dump an object structure as a string, which can be very useful for debugging complex applications or when the data structure of an object needs to be preserved for a future process (e.g. a second run of DocBridge Mill).

Note however that JSON.stringify will not dump anything when applied to Docponent objects, except when additional attributes/methods were explicitly added to them from within JavaScript.

JSON.parse(string) does the opposite: it converts a JSON string back to an internal JavaScript object.

This way, an external configuration file for instance could be saved as JSON, read with FileInputText() into a string, on which JSON.parse() can then be applied to create an internal JavaScript object.

The XML Object (E4X)

Since version 1.6 the JavaScript language, E4X provides a native object for processing XML (E4X stands for "ECMAScript for XML").

Since XML is so ubiquitous and sometimes comes even embedded as metadata within datastreams, or more frequently as external files containing configuration info or job-related data. DocBridge Mill users now have the possibility to process XML natively from within JavaScript, as demonstrated in the following simple example.

Suppose we have an external file "compart.xml"

<compart>
  <name>Compart AG</name>
  <address>
    <street>Otto-Lilienthal-Str. 38</street>
    <city>71034 Böblingen</city>
    <country>Germany</country>
  </address>
  <products>
    <product id="1">DocBridge Mill</product>
    <product id="2">DocBridge Delta</product>
    <product id="3">DocBridge View</product>
    <product id="4">DocBridge Pilot</product>
  </products>
</compart>

From within DocBridge Mill : cpmill, we can read its content into a string with

var xmlfile = new FileInputText("&ROOTDIR;/compart.xml", "UTF-8");
// or whatever encoding used internally
var xmlstring = '';
while (!xmlfile.eof()) {
  xmlstring += xmlfile.readln()
};
xmlfile.close();

Then an XML object can be created with

var compartxml = new XML(xmlstring);

and the XML content can be easily processed using the E4X API in a JavaScript-friendly way

var compart_address = compartxml.address.street + ", " + compartxml.address.city + "," + compartxml.address.country + ".";
var pilot = compartxml.products.product[3]; // "DocBridge Pilot"
var deltaid = compartxml.products.product[1].@id; // "2"
var deltaname = compartxml.products.product.(@id == "2"); // "DocBridge Delta"
log("compart_address: " + compart_address);
log("pilot: " + pilot);
log("deltaname, deltaid: " + deltaname + ", " + deltaid);

The full definition of E4X is provided in the ECMA-357 specification.

External Links

Mozilla's JavaScript engine

JSON

E4X

Back