Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Fri, 31 Oct 2003
What is integrated XML support anyway?
Kimbro Staken is wishing for XML support baked directly into a programming language. His criteria are:
  • Seamless XML support. Never having to explicitly parse an XML document
  • XPath as a native language construct
  • Dynamic conversion between text and parsed representations of the XML.
  • XPath manipulation for XML modifications
While I like the goal, I wonder about the implementation. I'd rather be able to deal with XML elements and attributes being fully expressible -- That is I'd like to have XML element literals not just XML element variables (especially since variables have no type in Python, it's the values that do). I also wonder about the efficiency of using XPath for updates. The syntax that is presented for XPath modification implies that you have to execute the query before you can update the document -- there's nothing like a cursor to help you keep your place.

This topic is fresh on my mind, as I've just finished reading a series of papers on this very topic:

Martin Kempa and Volker Linnemann's PlanX '02 paper, On XML Objects describe XOBE a system implemented as a Java preprocessor, which aims to

eliminate the distinction between the string representation and the object representation of XML documents.
Their approach generates classes for each of the elements in an XML Grammer (DTD or Schema) and allows for object literals that look syntactically like XML. XOBE also allows XPath expressions for querying the resulting object hierarchies.

Erik Meijer and Wolfram Schulte's OOPSLA 2003 submission: Unifying Tables, Objects, and Documents takes a different approach. Meijer and Schulte show how to extend C# (it could just as easily be Java) to deal with relational and XML data. They set forth a number of design principles for their experimental language, but two of the most important are:

  1. Denotable values should be (easily) expressible
  2. Expressible values should be denotable
This is at the heart of not having to parse and/or re-serialize.

C# is extended to support streams of various lengths (this can be done easily in languages which have generators, like Python). It is also extended to allow tuples (heterogeneous structures of optionally labelled variables of fixed length). Python has unlabelled tuples built in. The combination of streams and tuples is used to model relational data. The last supporting extension is union types.

Streams, tuples, and unions are the unused to model a large part of the XML Schema (XSD) type system. This means that the programmer can declare classes that correspond to XSD types. C# is also extended with XML literals which can be stored in instances of the appropriate class

In the area of querying, Meijer and Schulte have departed from XPath syntax. Instead, the mechanisms for looping / querying are taken from functional programming: lifting, apply-to-all, and folds. This provides a nice mechanism that works for streams (relational data) and XML data. These mechanisms support accessing fields/members of objects to provide path expression / XPath-like navigation. The authors also provide wildcard, transitive, and type-based member access. Wildcard access returns all the members of a type (in declaration order), transitive access allows you to find a member that is transitively reachable from some other member. Type-based access allows you to restrict the type of the member being (transitively) searched for (the restriction is reminiscent of XPath axis notation).

In addition to functional style querying, this extension of C# can also support a SQL like select-from-where clause. The cool thing that they point out is that it doesn't matter where the queried data is: it can be in memory or on disk. This is the beauty of the stream abstraction.

The same authors (plus one), Erik Meijer, Wolfram Schulte, and Gavin Bierman have a paper in XML 2003 Programming with Circles, Triangles and Rectangles, which focuses on the XML aspects of the language described in the previous paper. The experimental language is now being called Xen. This paper goes into a lot more detail on the mismatch between the XML data model and object data models. There's less on streams and tuples (and the type rules), a little more on lifting, and examples of handling the XQuery use cases in Xen.

Ned Batchelder points out that the version linked above is encoded in MSIE only html. Bierman has made a friendlier version available (at least it renders in Mozilla).

It seems clear that everyone agrees that denotable values should be easily expressible and that expressible values should be denotable. What's not as clear is what the query model should be for a language that has XML support baked in. XPath is an obvious choice, but then you only get the functionality for XML data, and I'd like to have the same kind of query capability for any hierarchical data. That's what I like about the Xen approach. Of course, it is possible to make XPath work over objects, ala the ObjectXPathNavigator or JXPath...

I'm glad to see this work being published / discussed. It seems to me that Xen is most likely to make an appearance in commercial products, since Erik Meijer works in the WebData group at Microsoft, which is a product, not a research group. If you want to stay in the statically typed curly brace language world, it looks like Microsoft is kicking the tires in all the right places. If Xen becomes C#2006, then Java will be sucking wind (at least in my book).

[15:06] | [computers/programming/xml] | # | TB | F | G | 4 Comments | Other blogs commenting on this post
Eclipse Omnibus
Dave Johnson saw these screenshots of Whidbey and despaired of having such an easy to use environment in a Java based toolset. I don't know if Dave has seen the IBM WSAD version of the HTML/JSP editor, but it can do a lot of what the Whidbey HTML editor can do. If you select rendered HTML and switch to source view, it highlights the tags responsible. It works backwards that way too. Of course, it costs $$$, but so will Whidbey.

In other Eclipse news, codesugar is a new plugin that generates equals(), clone(), toString(), and hashCode() methods.

[11:51] | [computers/programming/java/eclipse] | # | TB | F | G | 1 Comments | Other blogs commenting on this post

twl JPG


Ted Leung FOAF Explorer

I work at the Open Source Applications Foundation (OSAF).
The opinions expressed here are entirely my own, not those of my employer.

Creative Commons License
This work is licensed under a Creative Commons License.

Now available!
Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
Technorati Profile
PGP Key Fingerprint
My del.icio.us Bookmarks
My Flickr Photos

RSS 2.0 xml GIF
Comments (RSS 2.0) xml GIF
Atom 0.3 feed
Feedburner'ed RSS feed

< October 2003 >
    1 2 3 4
5 6 7 8 91011


Macintosh Tips and Tricks

Blogs nearby
geourl PNG

/ (1567)
  books/ (33)
  computers/ (62)
    hardware/ (15)
    internet/ (58)
      mail/ (11)
      microcontent/ (58)
      weblogs/ (174)
        pyblosxom/ (36)
      www/ (25)
    open_source/ (145)
      asf/ (53)
      osaf/ (32)
        chandler/ (35)
        cosmo/ (1)
    operating_systems/ (16)
      linux/ (9)
        debian/ (15)
        ubuntu/ (2)
      macosx/ (101)
        tips/ (25)
      windows_xp/ (4)
    programming/ (156)
      clr/ (1)
      dotnet/ (13)
      java/ (71)
        eclipse/ (22)
      lisp/ (34)
      python/ (86)
      smalltalk/ (4)
      xml/ (18)
    research/ (1)
    security/ (4)
    wireless/ (1)
  culture/ (10)
    film/ (8)
    music/ (6)
  education/ (13)
  family/ (17)
  gadgets/ (24)
  misc/ (47)
  people/ (18)
  photography/ (25)
    pictures/ (12)
  places/ (3)
    us/ (0)
      wa/ (2)
        bainbridge_island/ (17)
        seattle/ (13)
  skating/ (6)
  society/ (20)

[Valid RSS]

del.icio.us linkblog



Listed on BlogShares

Locations of visitors to this page
Where are visitors to this page?

pyblosxom GIF