Mon Sep 27 13:48:26 2004
XPathScript - A Viable Alternative to XSLT?Copyright © 2000 AxKit.com Ltd Abstract
Introduction
XPathScript is a stylesheet language for translating XML files into some
other format. It has only a few features, but by combining those
features with the power and flexibility of Perl, XPathScript is a very
capable system. Like all XML stylesheet languages, including XSLT, an
XPathScript stylesheet is always executed in the context of a source XML
file. In many cases the source XML file will actually define what
stylesheets to use via the
XPathScript was concieved as part of AxKit - an application server
environment for Apache servers running mod_perl (XML.com ran my
Introduction to AxKit The Syntax
XPathScript follows the basic ASP syntax of introducing code with the
<html> <body> <%= 5+5 %> </body> </html> <%= %> delimiters, which are slightly
different in that they send the results of the expression to the browser
(or to the next processing stage in AxKit).
Of course this example does absolutely nothing with the source XML file
which is completely separate from this stylesheet. Here's another
example:
<html> <body> <% $foo = 'World' %> Hello <%= $foo %> !!! </body> </html> "Hello World !!!". Again, we're
not actually doing anything here with our source document, so all XML
files using this stylesheet will look identical. This seems rather
uninteresting until we discover the library of functions that are
accesible to our XPathScript stylesheets for accessing the source
document contents.
The XPathScript APIAlong with the code delimiters XPathScript provides stylesheet developers with a full API for accessing and transforming the source XML file. This API can be used in conjunction with the delimiters above to provide a stylesheet language that is as powerful as XSLT, and yet provides all the features of a full programming language (in this case, Perl, but I'm certain that other implementations such as Python or Java would be possible). Extracting ValuesA simple example to get us started, is to use the API to bring in the title from a docbook article. A docbook article title looks like this: <article> <artheader> <title>XPathScript - A Viable Alternative to XSLT?</title> ... /article/artheader/title/text()
<html>
<head>
<title><%= findvalue("/article/artheader/title/text()") %></title>
</head>
<body>
This was a DocBook Article. We're only extracting the title for now!
<p>
The title was: <%= findvalue("/article/artheader/title/text()") %>
</body>
</html>
There are lots of features to the expression syntax we used to find that
"node", and this syntax is called XPath Extracting NodesThe above example showed us how to extract single values, but what if we have a list of things we wish to extract values from? Here's how we might get a table of contents from docbook article sections:
...
<%
for my $sect1 (findnodes("/article/sect1")) {
print $sect1->findvalue("title/text()"), "<br>\n";
for my $sect2 ($sect1->findnodes("sect2")) {
print " + ", $sect2->findvalue("title/text()"), "<br>\n";
for my $sect3 ($sect2->findnodes("sect3")) {
print " + + ", $sect3->findvalue("title/text()"), "<br>\n";
}
}
}
%>
...
expressions following that are relative to the
current node. You can see that by the absence of the leading
/. Again, XPath is a very interesting query language,
and you would be best to visit the XPath specification to
learn more.
Note that in the above we don't use the global function findnodes() after finding the sect1 nodes, instead we call the node method findnodes(), which does exactly the same thing, but makes the node you are calling from the context of the XPath expression. Declarative TemplatesThe examples up to now have all covered a concept of a single global template with a search/replace type functionality from the source XML document. This is a powerful concept in itself, especially when combined with loops and the ability to change the context of searches. But that style of template is limited in utility to well structured data, rather than processing large documents. In order to ease the processing of documents, XPathScript includes a declarative template processing model too, so that you can simply specify the format for a particular element and let XPathScript do the work for you.
In order to support this method, XPathScript introduces one more API
function: First a simple example to introduce this feature. Lets assume for a moment that our source XML file is valid XHTML, and we want to change all anchor links to italics. Here is the very simple XPathScript template that will do that:
<%
$t->{'a'}{pre} = '<i>';
$t->{'a'}{post} = '</i>';
$t->{'a'}{showtag} = 1;
%>
<%= apply_templates() %>
apply_templates() has to be output using
<%= %>. That's because
apply_templates() actually outputs a string
representation of the transformation, it doesn't do the output to the
browser for you.
The first thing this example does is sets up a hash reference
testcode in more depth later in , but for now know that it is a place holder
for code that allows for more complex templates.
Unlike XSLT's declarative transformation syntax, the keys of
The simple explanation for now is that A Complete ExampleNow lets put all of these ideas together into a (almost) complete example. This is part of the stylesheet I use to process my docbook articles online:
<!--#include file="docbook_tags.xps"-->
<%
my %links;
my $linkid = 0;
$t->{'ulink'}{testcode} = sub {
my $node = shift;
my $t = shift;
my $url = findvalue('@url', $node);
if (!exists $links{$url}) {
$linkid++;
$links{$url} = $linkid;
}
my $link_number = $links{$url};
$t->{pre} = "<i><a href=\"$url\">";
$t->{post} = " [$link_number]></i>";
return 1;
};
%>
<html>
<head>
<title><%= findvalue('/article/artheader/title/text()') %></title>
</head>
<body bgcolor="white">
<%
# display title/TOC page
print apply_templates('/article/artheader/*');
%>
<hr>
<%
# display particular page
foreach my $section (findnodes("/article/sect1")) {
print apply_templates($section);
}
%>
<h1>List of Links</h1>
<table border="1">
<th>URL</th>
<%
for my $link (sort {$links{$a} <=> $links{$b}} keys %links) {
%>
<tr>
<td><%= "[$links{$link}] $link" %></td>
</tr>
<% } %>
</table>
</body>
</html>
). The
import system is based on Server Side Includes (SSI) although only SSI
file includes are supported at this time (SSI virtual includes can be
implemented using mod_include). Here is part of the docbook_tags.xps
file:
<%
$t->{'attribution'}{pre} = "<i>";
$t->{'attribution'}{post} = "</i><br>\n";
$t->{'para'}{pre} = '<p>';
$t->{'para'}{post} = '</p>';
$t->{'ulink'}{testcode} = sub {
my $node = shift;
my $t = shift;
$t->{pre} = "<i><a href=\"" .
findvalue('./@url', $node) . "\">";
$t->{post} = '</a></i>';
return 1;
};
$t->{'title'}{testcode} = sub {
my $node = shift;
my $t = shift;
if (findvalue('parent::blockquote', $node)) {
$t->{pre} = "<b>";
$t->{post} = "</b><br>\n";
}
elsif (findvalue('parent::artheader', $node)) {
$t->{pre} = "<h1>";
$t->{post} = "</h1>";
}
else {
my $parent = findvalue('name(..)', $node);
if (my ($level) = $parent =~ m/sect(\d+)$/) {
$t->{pre} = "<h$level>";
$t->{post} = "</h$level>";
}
}
return 1;
};
%>
We go into detail of what is happening in this example in the next section. Stepping Through the Example
Careful readers will note that the first thing we see is a
Next we see the
Then we have a section with a comment marked: "display Title/TOC page".
This uses the The main part of the code loops through all sect1 tags, and calls apply_templates on those nodes. Note how this is another demonstration of Perl's TMTOWTDI (There's More Than One Way To Do It) approach - the same code could have been written:
<%= apply_templates("/article/sect1") %>
Finally, because this is the print version of our article, we provide a
list of links so that people viewing a printed version of this article
can type in those links, and they can also refer to the link by
reference number, as we saw earlier. We use the hash of links in the
The other file, The Template Hash
The
The keys of The following sub-keys define the transformation:
<xsl:copy> tag, only less verbose. The pre and post
options are useful because generally in transformations we want to
specify what comes before and after a tag. For example, to change an
HTML A tag to be in italics, but still have the link, we would use the
following:
$t->{A}{pre} = "<i>";
$t->{A}{post} = "</i>";
$t->{A}{showtag} = 1;
"testcode"
The
The value stored in
$t->{'ulink'}{testcode} = sub {
my ($node, $t) = @_;
$t->{pre} = '<i><a href="' . findvalue('@url', $node) . '">';
$t->{post} = '</a></i>';
return 1;
};
<xsl:template match="ulink"> <i><a> <xsl:attribute name="href"> <xsl:value-of select="@url"/> </xsl:attribute> <xsl:apply-templates/> </a></i> </xsl:template> $t is
lexically scoped, so changes to it don't affect the outer
$t. To save some confusion we might have named that
variable $localtransforms, but some people like
myself hate typing... ;-)
The return value from the testcode is also important. A return value of 1 means to process this node and continue processing all the children of this node. A return value of -1 means to process this node and stop, and a return value of 0 means do not process this node at all. This is useful in conditional tests, where you may not wish to process the nodes under certain conditions. You may also use a return code of a consisting of a string that is an XPath expression. See for more information.
It is important to note that we can do things here based on XPath
lookups just as we can in XSLT. While it is a little more verbose than a
simple XSLT pattern match, the trade off is in performance. An example
is in XSLT you might match
$t->{'title'}{testcode} = sub {
my $node = shift;
my $t = shift;
if (findvalue('parent::blockquote', $node)) {
$t->{pre} = "<b>";
$t->{post} = "</b><br>\n";
}
elsif (findvalue('parent::artheader', $node)) {
$t->{pre} = "<h1>";
$t->{post} = "</h1>";
}
else {
my $parent = findvalue('name(..)', $node);
if (my ($level) = $parent =~ m/sect(\d+)$/) {
$t->{pre} = "<h$level>";
$t->{post} = "</h$level>";
}
}
return 1;
};
$t hashref. Specifically note the
utility of being able to perform Perl regular expressions to extract
values.
Copying stylesOne really neat feature of XPathScript that is really hard to do with XSLT is to be able to copy a style completely:
<%
$t->{'foo'}{pre} = "<i>";
$t->{'foo'}{post} = "</i>";
$t->{'foo'}{showtag} = 1;
$t->{'bar'} = $t->{'foo'};
%>
<%
$t->{'foo'}{pre} = "<i>";
$t->{'foo'}{post} = "</i>";
$t->{'foo'}{showtag} = 1;
$t->{'bar'} = $t->{'foo'};
$t->{'bar'}{post} = "</i><br>";
%>
A "Catch All"?
Does XPathScript have a "catch all" option for elements that I don't
have a This feature was introduced in AxKit 0.94. Interpolation
Adding attributes or other data into the translated nodes is non-trivial
using this setup. It requires you to drop down into testcode. Here's an
example of turning
<%
$t->{'link'}{testcode} = sub {
my ($node, $t) = @_;
$t->{pre} = '<a href="' . $node->findvalue('@url') . '">';
$t->{post} = '</a>';
return 1;
};
%>
To make this a little simpler, in XPathScript as of AxKit 1.1, we have
introduced interpolation of the replacement strings, much the same as
you can do with XSLT attributes. Here is the appropriate
<%
$t->{'link'}{pre} = '<a href="{@url}">';
$t->{'link'}{post} = '</a>';
%>
{} delimit an XPath expression on
which findvalue is called using the current node as the context. Any
XPath expression should be valid within those delimiters.
As a backwards compatibility measure, and to ensure efficiency is defaulted, interpolation only occurs when you have the following somewhere in your Apache configuration defined for the current request: PerlSetVar AxXPSInterpolate 1 $XPathScript::DoNotInterpolate. Set
that to a true value to turn off interpolation. Be careful to only do
that locally (using the perl local keyword) to ensure
it doesn't remain set for the next invocation of the script.
Writing Dynamic ContentBecause XPathScript has full access to all the perl builtins, you can very easily create dynamic content with XPathScript. There is only 1 caveat though: The AxKit cache works on the basis of the timestamp of the original XML file. This means that your XPathScript code will only be executed when the XML resource that is being requested actually changes. To work around this limitation you simply need to tell AxKit that this stylesheet contains dynamic content, and therefore the output should not be cached. The syntax for this duplicates the Apache API for telling proxy servers not to cache the output: <% ... $r->no_cache(1); ... %> An XPathScript Mini-Reference
Code is separated from output in XPathScript using the
Perl expression results can be sent to the browser either using
The following XPath functions are imported for your use:
The first three methods are documented more completely in the XML::XPath manual pages.
Apply templates examines the contents of the local
Import template can be used to pull in an external XPathScript template
file.
$t->{BODY}{testcode} = import_template("/xps/bodystyle.xps");
$t as the parent stylesheet. You can get at the
usual testcode version of $t by
using $real_local_t.
If you want to include a stylesheet anyway (not as part of a testcode setup), just write it as normal, and include a line like this in the parent stylesheet:
import_template("/xps/bodystyle.xps")->();
The value in
If a value is not found in
If a value is found in
XPathScript stylesheets can be modularised using SSI #include directives. The code in #included files is added verbatim into the current code at the position of the include. This allows you to use this fact to override defaults (as we saw in the first example where the template for ulink is overridden). Using XPathScript to Write XSP TagLibs
XSP is an alternative server side XML programming API. It is not a
stylesheet system though - the XSP page is executed directly without a
stylesheet. XSP was originally incorporated into the Cocoon One of the interesting things about XSP is the ability to write taglibs using some form of stylesheet transformation language. A taglib is a separate sheet of tags that have special meaning to your code. They can execute external functions or simply be used in a similar way to external parsed entities. Here's the classic example of a usage of a taglib from the Cocoon documentation (slightly modified from the original): <xsp:page language="Perl" xmlns:xsp="http://www.apache.org/1999/XSP/Core" xmlns:example="http://www.plenix.com/DTD/XSP/Example" > <page title="Time of Day"> <p> To the best of my knowledge, it's now <!-- Substitute time of day here --> <example:time-of-day format="%y/%m/%d %r"/> </p> </page> </xsp:page>
Here the
A taglib implementation is a stylesheet that is evaluated against this
file prior to passing it to the XSP processor. The stylesheet converts
the tags that it recognises into pure XSP code (see http://xml.apache.org/cocoon/xsp.html The Cocoon recommendation is to write taglibs using XSLT. This works well, but the code often looks confusing. My recommendation for AxKit is to use XPathScript. Here's our implementation of the time-of-day tag using XPathScript:
<%
$t->{'xsp:page'}{prechildren} = <<EOXML;
<xsp:structure>
<xsp:include>POSIX</xsp:include>
</xsp:structure>
EOXML
$t->{'xsp:page'}{showtag} = 1;
$t->{'example:time-of-day'}{testcode} = sub {
my ($node, $t) = @_;
$t->{pre} =
'<xsp:expr>
POSIX::strftime("' . findvalue('@format', $node) . '", localtime)
</xsp:expr>';
return 1;
};
%>
<%= apply_templates() %>
In order to enable this tag library, we simply make the taglib stylesheet the first in our stylesheet cascade:
<?xml version="1.0"?>
<?xml-stylesheet type="application/x-xpathscript" href="example.taglib"?>
<?xml-stylesheet type="application/x-xsp" href="."?>
<?xml-stylesheet type="text/xsl" href="example.xsl"?>
<xsp:page
language="Perl"
xmlns:xsp="http://www.apache.org/1999/XSP/Core"
xmlns:example="http://www.plenix.com/DTD/XSP/Example"
>
<page title="Time of Day">
<p>
To the best of my knowledge, it's now
<!-- Substitute time of day here -->
<example:time-of-day format="%y/%m/%d %r"/>
</p>
</page>
</xsp:page>
For comparison, here's the equivalent XSLT based taglib:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0"
xmlns:xsp="http://www.apache.org/1999/XSP/Core"
xmlns:example="http://www.plenix.com/DTD/XSP/Example"
>
<xsl:template match="xsp:page">
<xsp:page>
<xsl:copy>
<xsl:apply-templates select="@*"/>
</xsl:copy>
<xsp:structure>
<xsp:include>POSIX</xsp:include>
</xsp:structure>
<xsl:apply-templates/>
</xsp:page>
</xsl:template>
<xsl:template match="example:time-of-day">
<xsp:expr>
POSIX::strftime("<xsl:value-of select="@format"/>", localtime)
</xsp:expr>
</xsl:template>
<xsl:template match="@*|node()" priority="-1">
<xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>
</xsl:stylesheet>
Edit This Page / Show Page History / |

Home
which provides
full programming facilities alongside XPath based node resolution.
It also features code / template separation using the
ASP