Orchard: A New API for XML ---------------------------------------------------------------------- **** Background ************************ # this session is a technical overview of Orchard's API as compared primarily to DOM and SAX ---------------------------------------------------------------------- **** What Is. # XML has trees, streams, queries, and transforms * DOM is a way to model trees. # Tree views of XML are used when one needs random access to all parts of an XML document. # DOM is a W3C standard *interface*. * SAX is a way to model streams. # Stream views of XML are used when one can process XML sequentially or for very large files. # Queries and transforms modify and direct the behavior of trees and streams. Queries and transforms work generally the same in Orchard as they do in DOM or SAX. ---------------------------------------------------------------------- **** What's wrong * So-called "Language independent APIs" aren't natural # IDLs tend to be skewed to how C++ and Java-like languages work. # DOM and SAX, in particular, also have often redundant "helper" classes in their specifications because some languages don't have the necessary support classes. * Leads to development of "natural" APIs: JDOM, XML::Simple # JDOM is very similar to DOM, but using native collection classes and many convenience methods. # XML::Simple reduces XML data by pulling the information out of the elements and attributes and placing them in collection classes using field names derived from the element and attribute names * Fixed intersection with XML Information set # DOM and SAX select one fixed portion of the infoset and have little formal support for reducing or extending the information available to users * Editors need more, common XML needs far less # good editing applications need to be able to preserve the original syntax of XML instances # the vast majority of XML most users use is only elements, attributes, and character data * PEEK and POKE # DOM and SAX act as a layer between an application and the information that they want to process, often reducing applications to walking the XML structure to access their data -- reminiscent of the PEEK and POKE functions of early BASIC. ---------------------------------------------------------------------- **** What's different * Data models more language independent * DOM and SAX say, "come to me" * Orchard says, "I'll come to you" # Data models can be mapped into language syntax to the best of the abilities of the language, not restricted to the "interface" model of IDLs. # *Note* Interfaces are often equated with encapsulation, thinking that one can't have encapsulation if you don't use "methods". Many languages support encapsulation while still using attribute accessors, and Orchard relies on that. * Progressive disclosure, scalable intersection # Orchard starts small and then uses "progressive disclosure" to expand on the information sets it supports. # Orchard has a scalable intersection with both the XML InfoSet and a larger set that includes information needed by editors (and even SGML). # For example, for XML in this session we'll just be talking about documents, elements, attributess, and character data, and that's all one needs to work with in Orchard (if that much!). Processing instructions, comments, entities, and declarations, and more, are still there for the asking. * Orchard merges features of SAX and DOM # SAX events pass nodes, SAX filters can build trees, SAX can be used to walk a DOM tree, DOM nodes are passed as-is * Emphasis on application-specific data models # Orchard emphasizes mapping XML into application-specific data models (info sets), and not working with raw XML structures, as with PEEK and POKE. ---------------------------------------------------------------------- **** Common XML in Orchard ************************ # The following slides describe several Orchard features in terms of just the most commonly used XML information items. ---------------------------------------------------------------------- **** XML nodes * Document * root, contents * Element * name, namespace-uri, prefix, local-name, attributes, contents * Attribute * name, namespace-uri, prefix, local-name, value * Characters * data * Properties are case and underscore mapped # ...to conform to the language's most common style. * NamespaceURI <-> namespace-uri <-> namespace_uri ---------------------------------------------------------------------- **** XML nodes example -- Source XML Catching Roadrunners Wile E. Coyote ---------------------------------------------------------------------- **** XML nodes reading example -- Perl use Orchard qw{namespace}; use Orchard::XML; my $DC = namespace("http://purl.org/dc/elements/1.1/"); my $MY = namespace("http://bookseller.biz/"); my $book = Orchard::XML->load('book.xml'); my $title = $book->get_elements_by_tag_name($DC->title); my $author = $book->get_elements_by_tag_name($DC->author); my $info = $book->get_elements_by_tag_name($MY->info); print "Title: $title\n"; print "Author: $author\n"; print "Price: " . $info->{Attributes}{$MY->price} . "\n"; ---------------------------------------------------------------------- **** XML nodes writing example -- Python from Orchard import namespace from Orchard.XML import * DC = namespace("http://purl.org/dc/elements/1.1/") MY = namespace("http://bookseller.biz/") doc = Orchard.XML.Document() root = doc.create_element('book') title = doc.create_element(DC.title) title.contents = "Catching Roadrunners" author = doc.create_element(DC.author) author.contents = "Wile E. Coyote" info = doc.create_element(MY.info) info.attributes[MY.price] = "$34.95" info.attributes[MY.in_stock] = "true" root.contents.append(title, author, info) print root.to_string() ---------------------------------------------------------------------- **** SAX * Passes nodes as parameter # instead of positional parameters # parsers, filters, and handlers can add properties # very upward compatible -- Perl SAX1 -> Perl SAX2 (if we can get stringification of nodes worked out) # SAX event nodes are the same nodes used in trees -- filters can fill-in trees and sub-trees for downstream handlers * Features govern node properties # Parsers can offer features for what properties are parsed and passed, thus, * Reduced need for callbacks and interfaces * Scalable, dynamic interface # Most languages do not need to implement default base classes or formal interfaces # Parsers dynamically check to see which methods handlers support * Pull * next_event(), peek_event(), skip_events() # A pull-style interface is being developed. # Some users, especially beginners, find pull interfaces much simpler. # The pull interface is consistent with the push interface (SAX) through the use of nodes. # Possible that push and pull interfaces can be mixed via a stream manager. ---------------------------------------------------------------------- **** Beyond XML ************************ # Orchard is far more than XML. Ideally, raw XML should only be a fraction of what Orchard is used for. # The manner in which you access nodes stays the same. # Mapping XML serialized information into their own info sets. ---------------------------------------------------------------------- **** Types of data modeling # Earlier, I said data models are far more portable than APIs, and they are, but particular data modelling types more naturally fit some languages better than others. * lists -- scheme, lisp * rows and columns -- RDBs, spreadsheets * tuples, arcs -- RDF/Topic Maps, logic * nodes/entities/objects -- common languages, Groves, and Orchard # The Orchard implementations are designed to handle any type of node (or object), XML is just one node set (or infoset) among many. * XML is just one set of nodes in Orchard * RSS, SVG, SlideShow, CDDB, MPEG, RDF, ... * RDF and Groves (Orchard) are complementary # the same sort of modelling produces similar results * RDF can be very loose and much more flexible # ...allowing multiple property values for the same resource, information gathered from multiple sources, and smushing of data # "smushing" is an RDF technical term that refers to lumping a lot of overlapping data from multiple sources, requiring extra care to extract useful information * Groves are simpler in that respect # ...by only presenting one set of properties for a node * Groves and RDF are interoperable # slices of RDF data can be viewed through a Grove API # RDF data can supplement data visible through the Grove # Grove schemas are inherently subsets of RDF schemas ---------------------------------------------------------------------- **** Common node behaviors * Introspection # information about what properties are defined or available for nodes. # possible use of schema languages to provide property information. * Iteration, navigation, DOM level 2 and 3 # Orchard uses native mapping and sequence iteration and navigation whereever possible. # DOM level 2 and 3 add Views, Events, Style, Traversal and Range, XPath * Namespaced property names # *Unique feature of Orchard* # attribute and element names mapped from XML instances that use namespaces retain their fully qualified names # user or module-added properties can reside in their own namespace to avoid conflict with other sources * Underlying storage, load/save # nodes can use storage drivers for in-memory, on-disk, or remote storage # serialization is automatic, using introspection * XPath, XSLT # Queries and transforms can be applied to any node set, not just XML # Similar to projects going on in RDF ---------------------------------------------------------------------- **** RSS Example -- Python from Orchard import namespace import Orchard.RSS channel = \ Orchard.RSS.load('http://MonkeyFist.com/rss1.php3') DC = namespace("http://purl.org/dc/elements/1.1/") print "Site: " + channel.title print "URL: " + channel.link print "Description: " + channel.description print "Copyright: " + channel[DC.rights] print "Language: " + channel[DC.language] print "Publisher: " + channel[DC.publisher] print print "Items:" for item in channel.items: print " Title: " + item.title print " Link: " + item.link print " Description: " + item.description print " Link creator: " + item[DC.creator] print " Link date: " + item[DC.date] print ---------------------------------------------------------------------- **** RSS Example -- Perl use Orchard qw{namespace}; use Orchard::RSS; my $channel = \ Orchard::RSS->load('http://MonkeyFist.com/rss1.php3'); my $DC = namespace('http://purl.org/dc/elements/1.1/'); print "Site: " + $channel->{Title} . "\n"; print "URL: " + $channel->{Link} . "\n"; print "Description: " + $channel->{Description} . "\n"; print "Copyright: " + $channel->{$DC->rights} . "\n"; print "Language: " + $channel->{$DC->language} . "\n"; print "Publisher: " + $channel->{$DC->publisher} . "\n"; print "\n"; print "Items:\n;" foreach my $item ($channel->{Items}) { print " Title: " + $item->{Title} . "\n"; print " Link: " + $item->{Link} . "\n"; print " Description: " + $item->{Description} . "\n"; print " Link creator: " + $item->{$DC->creator}. "\n"; print " Link date: " + $item->{$DC->date} . "\n"; print "\n"; ---------------------------------------------------------------------- **** What's next * XPath and XSLT on any node tree * DOM level 2 and 3 on any node tree * Vertical development * More node sets, transforms * Horizontal development * language bindings, storage drivers, common behaviors * Cross-compatibility with RDF * Node set schema languages ---------------------------------------------------------------------- **** Resources and contact * http://Orchard.SourceForge.net/ * http://www.w3.org/DOM/ * http://www.Megginson.com/SAX/ * Ken MacLeod * Matt Sergeant