Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Fri, 31 Oct 2003
What is integrated XML support anyway?
Kimbro Staken is wishing for XML support baked directly into a programming language. His criteria are:
  • Seamless XML support. Never having to explicitly parse an XML document
  • XPath as a native language construct
  • Dynamic conversion between text and parsed representations of the XML.
  • XPath manipulation for XML modifications
While I like the goal, I wonder about the implementation. I'd rather be able to deal with XML elements and attributes being fully expressible -- That is I'd like to have XML element literals not just XML element variables (especially since variables have no type in Python, it's the values that do). I also wonder about the efficiency of using XPath for updates. The syntax that is presented for XPath modification implies that you have to execute the query before you can update the document -- there's nothing like a cursor to help you keep your place.

This topic is fresh on my mind, as I've just finished reading a series of papers on this very topic:

Martin Kempa and Volker Linnemann's PlanX '02 paper, On XML Objects describe XOBE a system implemented as a Java preprocessor, which aims to

eliminate the distinction between the string representation and the object representation of XML documents.
Their approach generates classes for each of the elements in an XML Grammer (DTD or Schema) and allows for object literals that look syntactically like XML. XOBE also allows XPath expressions for querying the resulting object hierarchies.

Erik Meijer and Wolfram Schulte's OOPSLA 2003 submission: Unifying Tables, Objects, and Documents takes a different approach. Meijer and Schulte show how to extend C# (it could just as easily be Java) to deal with relational and XML data. They set forth a number of design principles for their experimental language, but two of the most important are:

  1. Denotable values should be (easily) expressible
  2. Expressible values should be denotable
This is at the heart of not having to parse and/or re-serialize.

C# is extended to support streams of various lengths (this can be done easily in languages which have generators, like Python). It is also extended to allow tuples (heterogeneous structures of optionally labelled variables of fixed length). Python has unlabelled tuples built in. The combination of streams and tuples is used to model relational data. The last supporting extension is union types.

Streams, tuples, and unions are the unused to model a large part of the XML Schema (XSD) type system. This means that the programmer can declare classes that correspond to XSD types. C# is also extended with XML literals which can be stored in instances of the appropriate class

In the area of querying, Meijer and Schulte have departed from XPath syntax. Instead, the mechanisms for looping / querying are taken from functional programming: lifting, apply-to-all, and folds. This provides a nice mechanism that works for streams (relational data) and XML data. These mechanisms support accessing fields/members of objects to provide path expression / XPath-like navigation. The authors also provide wildcard, transitive, and type-based member access. Wildcard access returns all the members of a type (in declaration order), transitive access allows you to find a member that is transitively reachable from some other member. Type-based access allows you to restrict the type of the member being (transitively) searched for (the restriction is reminiscent of XPath axis notation).

In addition to functional style querying, this extension of C# can also support a SQL like select-from-where clause. The cool thing that they point out is that it doesn't matter where the queried data is: it can be in memory or on disk. This is the beauty of the stream abstraction.

The same authors (plus one), Erik Meijer, Wolfram Schulte, and Gavin Bierman have a paper in XML 2003 Programming with Circles, Triangles and Rectangles, which focuses on the XML aspects of the language described in the previous paper. The experimental language is now being called Xen. This paper goes into a lot more detail on the mismatch between the XML data model and object data models. There's less on streams and tuples (and the type rules), a little more on lifting, and examples of handling the XQuery use cases in Xen.

Ned Batchelder points out that the version linked above is encoded in MSIE only html. Bierman has made a friendlier version available (at least it renders in Mozilla).

It seems clear that everyone agrees that denotable values should be easily expressible and that expressible values should be denotable. What's not as clear is what the query model should be for a language that has XML support baked in. XPath is an obvious choice, but then you only get the functionality for XML data, and I'd like to have the same kind of query capability for any hierarchical data. That's what I like about the Xen approach. Of course, it is possible to make XPath work over objects, ala the ObjectXPathNavigator or JXPath...

I'm glad to see this work being published / discussed. It seems to me that Xen is most likely to make an appearance in commercial products, since Erik Meijer works in the WebData group at Microsoft, which is a product, not a research group. If you want to stay in the statically typed curly brace language world, it looks like Microsoft is kicking the tires in all the right places. If Xen becomes C#2006, then Java will be sucking wind (at least in my book).

[15:06] | [computers/programming/xml] | # | TB | F | G | 4 Comments | Other blogs commenting on this post
Phillip Eby wrote about a bunch of his ideas for querying across different data backends (for use in PEAK).  Honestly, the notes were so extensive that I haven't followed much of it.  The thread starts here and continues in other posts to the mailing list.  There seems to be some similar ideas or desires to what you're writing about here.

To me it seems too ambitious -- I'd rather see some more robust querying mechanisms in Python that were domain-specific before trying to unify it all across disparate domains.
Posted by Ian Bicking at Fri Oct 31 16:52:11 2003

FWIW the Groovy language for the JVM follows some similar patterns to Xen in allowing markup to be used in the language and supporting XPath-like navigation expressions on java/groovy objects
Posted by James Strachan at Mon Nov 3 02:56:30 2003



You can subscribe to an RSS feed of the comments for this blog: RSS Feed for comments

Add a comment here:

You can use some HTML tags in the comment text:
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are: <a href>, <em>, <i>, <b>, <blockquote>, <br/>, <p>, <code>, <pre>, <cite>, <sub> and <sup>.

You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk

Name:


E-mail:


URL:


Comment:


Remember my info?


twl JPG

About

Ted Leung FOAF Explorer

I work at the Open Source Applications Foundation (OSAF).
The opinions expressed here are entirely my own, not those of my employer.

Creative Commons License
This work is licensed under a Creative Commons License.

Now available!
Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
Technorati Profile
PGP Key Fingerprint
My del.icio.us Bookmarks
My Flickr Photos


Syndicate
RSS 2.0 xml GIF
Comments (RSS 2.0) xml GIF
Atom 0.3 feed
Feedburner'ed RSS feed

< October 2003 >
SuMoTuWeThFrSa
    1 2 3 4
5 6 7 8 91011
12131415161718
19202122232425
262728293031 

Archives
2006
2005
2004
2003

Articles
Macintosh Tips and Tricks

Search
Lucene
Blogs nearby
geourl PNG

Categories
/ (1567)
  books/ (33)
  computers/ (62)
    hardware/ (15)
    internet/ (58)
      mail/ (11)
      microcontent/ (58)
      weblogs/ (174)
        pyblosxom/ (36)
      www/ (25)
    open_source/ (145)
      asf/ (53)
      osaf/ (32)
        chandler/ (35)
        cosmo/ (1)
    operating_systems/ (16)
      linux/ (9)
        debian/ (15)
        ubuntu/ (2)
      macosx/ (101)
        tips/ (25)
      windows_xp/ (4)
    programming/ (156)
      clr/ (1)
      dotnet/ (13)
      java/ (71)
        eclipse/ (22)
      lisp/ (34)
      python/ (86)
      smalltalk/ (4)
      xml/ (18)
    research/ (1)
    security/ (4)
    wireless/ (1)
  culture/ (10)
    film/ (8)
    music/ (6)
  education/ (13)
  family/ (17)
  gadgets/ (24)
  misc/ (47)
  people/ (18)
  photography/ (25)
    pictures/ (12)
  places/ (3)
    us/ (0)
      wa/ (2)
        bainbridge_island/ (17)
        seattle/ (13)
  skating/ (6)
  society/ (20)



[Valid RSS]

del.icio.us linkblog

www.flickr.com

Blogroll

java.blogs
Listed on BlogShares

Locations of visitors to this page
Where are visitors to this page?


pyblosxom GIF