![]() |
|
XPathScript - A Viable Alternative to XSLT?Matt Sergeant
This guide gives an introduction to the features of XPathScript, a template processor that is part of AxKit [1] which provides full programming facilities alongside XPath based node resolution. It also features code / template separation using the ASP <% %> paradigm. IntroductionXPathScript is a stylesheet language for translating XML files into some other format. It has only a few features, but by combining those features with the power and flexibility of Perl, XPathScript is a very capable system. Like all XML stylesheet languages, including XSLT, an XPathScript stylesheet is always executed in the context of a source XML file. In many cases the source XML file will actually define what stylesheets to use via the <?xml-stylesheet?> processing instruction. XPathScript was concieved as part of AxKit - an application server environment for Apache servers running mod_perl (XML.com ran my Introduction to AxKit [2] article in May). Its primary goal was to achieve the sorts of transformations that XSLT can do, without being restricted by XSLT's XML based syntax, and to provide full programming facilities within that environment. I also wanted XPathScript to be completely agnostic about output formats, without having to program in special after-effect filters. The result is a language for server-side transformation that provides the power and flexibility of XSLT combined with the full capabilities of the Perl language, and the ability to produce stylesheets in any ASP capable editor or ordinary text editor. The above Introduction to AxKit is recommended reading before reading this guide. The Syntax
XPathScript follows the basic ASP syntax of introducing code with the
<% %> delimiters. Here's a brief example of a
fully compatible XPathScript stylesheet:
The XPathScript APIAlong with the code delimiters XPathScript provides stylesheet developers with a full API for accessing and transforming the source XML file. This API can be used in conjunction with the delimiters above to provide a stylesheet language that is as powerful as XSLT, and yet provides all the features of a full programming language (in this case, Perl, but I'm certain that other implementations such as Python or Java would be possible). Extracting Values
A simple example to get us started, is to use the API to bring in the
title from a docbook article. A docbook article title looks like this:
There are lots of features to the expression syntax we used to find that "node", and this syntax is called XPath [3]. This is a W3C standard for finding and matching XML document nodes. The standard is fairly readable and is at http://www.w3.org/TR/xpath [3] alternatively I can recommend Norm Walsh's XPath introduction [4] which covers a slightly older version of the specification, but I didn't notice anything in the article that is missing or different from the current recommendation. Extracting Nodes
The above example showed us how to extract single values, but what if we
have a list of things we wish to extract values from? Here's how we
might get a table of contents from docbook article sections:
Note that in the above we don't use the global function findnodes() after finding the sect1 nodes, instead we call the node method findnodes(), which does exactly the same thing, but makes the node you are calling from the context of the XPath expression. Declarative TemplatesThe examples up to now have all covered a concept of a single global template with a search/replace type functionality from the source XML document. This is a powerful concept in itself, especially when combined with loops and the ability to change the context of searches. But that style of template is limited in utility to well structured data, rather than processing large documents. In order to ease the processing of documents, XPathScript includes a declarative template processing model too, so that you can simply specify the format for a particular element and let XPathScript do the work for you. In order to support this method, XPathScript introduces one more API function: apply_templates(). The name is intended to appeal to people already familiar with XSLT. The apply_templates() function takes either a list of start nodes, or an XPath [3] expression (that must result in a node set) and optional context. Starting at the start nodes it traverses the document tree applying the templates defined by the $t hash reference.
First a simple example to introduce this feature. Lets assume for a
moment that our source XML file is valid XHTML, and we want to change
all anchor links to italics. Here is the very simple XPathScript
template that will do that:
The first thing this example does is sets up a hash reference $t that XPathScript knows about (lets call it magical). The keys of $t are element names (including namespace prefix if we are using namespaces). The hash can have the following sub-keys:
Unlike XSLT's declarative transformation syntax, the keys of $t do not specify XPath [3] match expressions. Instead they are simple element names. This is a trade off of speed of execution over flexibility. Perl hash lookups are extremely quick compared to XPath matching. Luckily because of the testcode option, more complex matches are quite possible with XPathScript. The simple explanation for now is that pre specifies output to appear before the tag, post specifies output to appear after the tag, and showtag specifies that the tag itself should be output as well as the pre and post values. A Complete Example
Now lets put all of these ideas together into a (almost) complete
example. This is part of the stylesheet I use to process my docbook articles
online:
We go into detail of what is happening in this example in the next section. Stepping Through the ExampleCareful readers will note that the first thing we see is a $t specification for <ulink> tags, and you'll also note that the included docbook_tags.xps contains a specification for <ulink>. The reason is to override the default behaviour for ulink tags in the print version of my articles to contain a reference that we can use later in a list of links. We can also see that this specification uses a testcode parameter that we haven't encountered before. We'll see how and why that's used later in The Template Hash. Next we see the findvalue() function used exactly as we already saw in Extracting Values. Then we have a section with a comment marked: "display Title/TOC page". This uses the apply_templates() function with an XPath [3] expression. Note that rather than use the <%= %> delimiters around the apply_templates() call, we simply use the print function. This has the same effect, and is used here to show the flexibility in this approach.
The main part of the code loops through all sect1 tags, and calls
apply_templates on those nodes. Note how this is another demonstration
of Perl's TMTOWTDI (There's More Than One Way To Do It) approach - the
same code could have been written:
Finally, because this is the print version of our article, we provide a list of links so that people viewing a printed version of this article can type in those links, and they can also refer to the link by reference number, as we saw earlier. We use the hash of links in the %links variable that we built in the testcode handler for our ulink template. The other file, docbook_tags.xps, is included only in part here, to demonstrate a few of the transformations we're applying to various docbook article tags. We can see that we're turning <para> tags into <p> tags, and doing some more complex processing with testcode to <title> tags. We'll see in The Template Hash exactly what testcode allows us to achieve. The Template HashThe apply_templates() function iterates over the nodes of your XML file applying the templates in the $t hash reference. This is the most important feature of XPathScript, because it allows you to define the appearance for individual tags without having to do it programmatically. This is the declarative part of XPathScript. There is an important point to make here: XSLT is a purely declarative syntax, and people are having to work procedural code into XSLT via work arounds. XPathScript takes a much more pragmatic approach (much like Perl itself) - it is both declarative and procedural, allowing you the flexibility to use real code for real problems. It is important to note that apply_templates returns a string, so you must either use print apply_templates() if using it from a Perl section of code, or via <%= apply_templates() %>. The keys of $t are the names of the elements, including namespace prefixes. When you call apply_templates(), every element visited is looked up in the $t hash, and the template items stored in that hash are applied to the node. It's worth noting at this point, that unlike XSLT, XPathScript does not perform tree transformations from one tree to another. It simply sends its output to the browser directly. This has advantages and disadvantages, but they are beyond the scope of this guide. The following sub-keys define the transformation:
"testcode"The testcode option is where we perform really powerful transformations. Its how we can do more complex tests on the node that are available in XPath, and locally modify the transformation based on what we find.
The value stored in testcode is simply a reference to
a subroutine. In Perl these are incredibly simple to create using the
anonymous sub keyword (note that these are often erroneously called
closures, but they only become closures if they reference a lexical
variable outside the scope of the subroutine itself). The sub is called
every time one of these elements is visited. The subroutine is passed two parameters:
The node itself, and an empty hash reference that you can populate using
the pre, post,
prechildren, prechild,
postchildren, postchild and
showtag values that we've discussed already. Unlike
the global $t hashref you don't have to first
specify the element name as a key. Here's the
<ulink> example from the global tags code above:
The return value from the testcode is also important. A return value of 1 means to process this node and continue processing all the children of this node. A return value of -1 means to process this node and stop, and a return value of 0 means do not process this node at all. This is useful in conditional tests, where you may not wish to process the nodes under certain conditions. You may also use a return code of a consisting of a string that is an XPath expression. See An XPathScript Mini-Reference for more information.
It is important to note that we can do things here based on XPath
lookups just as we can in XSLT. While it is a little more verbose than a
simple XSLT pattern match, the trade off is in performance. An example
is in XSLT you might match artheader/title and
elsewhere you might match title[name(..) != "artheader".
In XPathScript we can only match "title" in the
template hash. But we can use the testcode section to extend the match:
Copying styles
One really neat feature of XPathScript that is really hard to do with
XSLT is to be able to copy a style completely:
A "Catch All"?Does XPathScript have a "catch all" option for elements that I don't have a $t entry for? Yes, of course! Simply set $t->{'*'} to the template you want to execute. You can even do some really clever things, such as using the testcode section to output a warning to the Apache error log about an unrecognised tag, rather than having to place some output in the resulting document and bother your users! This feature was introduced in AxKit 0.94. Interpolation
Adding attributes or other data into the translated nodes is non-trivial
using this setup. It requires you to drop down into testcode. Here's an
example of turning <link url="..."> tags into
HTML <a> tags:
To make this a little simpler, in XPathScript as of AxKit 1.1, we have
introduced interpolation of the replacement strings, much the same as
you can do with XSLT attributes. Here is the appropriate
$t entry as of AxKit 1.1:
As a backwards compatibility measure, and to ensure efficiency is
defaulted, interpolation only occurs when you have the following
somewhere in your Apache configuration defined for the current request:
Writing Dynamic ContentBecause XPathScript has full access to all the perl builtins, you can very easily create dynamic content with XPathScript. There is only 1 caveat though: The AxKit cache works on the basis of the timestamp of the original XML file. This means that your XPathScript code will only be executed when the XML resource that is being requested actually changes.
To work around this limitation you simply need to tell AxKit that this
stylesheet contains dynamic content, and therefore the output should not
be cached. The syntax for this duplicates the Apache API for telling
proxy servers not to cache the output:
An XPathScript Mini-ReferenceCode is separated from output in XPathScript using the <% %> delimiters. Perl expression results can be sent to the browser either using print() if inside a <% %> section, or via <%= code %>. The following XPath functions are imported for your use:
The first three methods are documented more completely in the XML::XPath manual pages. Apply templates examines the contents of the local $t hash reference for elements names. For example, when encountering a <foo> element via apply_templates, XPathScript will try to find a transformation hash in the key $t->{'foo'}.
Import template can be used to pull in an external XPathScript template
file. $uri should be a path to the stylesheet to be
included. The function returns an anonymous subroutine that when
executed will run the stylesheet. The anonymous subroutine takes two
arguments, which makes it ideal to plug into a
testcode entry, for example:
If you want to include a stylesheet anyway (not as part of a testcode
setup), just write it as normal, and include a line like this in the
parent stylesheet:
The value in $t->{'foo'} above is a hash reference with the following optional keys:
If a value is not found in $t for the current element, then the element is output verbatim, and apply_templates performed on all its children. Except in the case where a $t->{'*'} value exists, which is a "catchall" transformation specification. This might be a useful place to add some testcode to output a warning to the error log. If a value is found in $t for the current element then the tag itself is not displayed unless $t->{<element_name>}{showtag} is set to a true value. testcode is a reference to a subroutine (often constructed as an anonymous subroutine). The subroutine is called with two parameters: The current node and a localised hash reference to store new transformations for this node and this node only. The return value from this subroutine must be one of:
XPathScript stylesheets can be modularised using SSI #include directives. The code in #included files is added verbatim into the current code at the position of the include. This allows you to use this fact to override defaults (as we saw in the first example where the template for ulink is overridden). Using XPathScript to Write XSP TagLibsXSP is an alternative server side XML programming API. It is not a stylesheet system though - the XSP page is executed directly without a stylesheet. XSP was originally incorporated into the Cocoon [6] application framework, and AxKit included XSP capabilities because it's a very interesting and useful tool.
One of the interesting things about XSP is the ability to write taglibs
using some form of stylesheet transformation language. A taglib is a
separate sheet of tags that have special meaning to your code. They can
execute external functions or simply be used in a similar way to
external parsed entities. Here's the classic example of a usage of a
taglib from the Cocoon documentation (slightly modified from the
original):
Here the <example:time-of-day> tag gets converted at run time to the current time using the strftime format specified in the format attribute. A taglib implementation is a stylesheet that is evaluated against this file prior to passing it to the XSP processor. The stylesheet converts the tags that it recognises into pure XSP code (see http://xml.apache.org/cocoon/xsp.html [7] for more information on XSP). While this seems a rather redundant feature, it allows even further separation between code and design. Designers can just introduce these special tags, without worrying about the logic behind them.
The Cocoon recommendation is to write taglibs using XSLT. This works
well, but the code often looks confusing. My recommendation for AxKit is
to use XPathScript. Here's our implementation of the time-of-day tag
using XPathScript:
In order to enable this tag library, we simply make the taglib
stylesheet the first in our stylesheet cascade:
For comparison, here's the equivalent XSLT based taglib:
List of Links
|