Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Fri, 31 Oct 2003
What is integrated XML support anyway?
Kimbro Staken is wishing for XML support baked directly into a programming language. His criteria are:
  • Seamless XML support. Never having to explicitly parse an XML document
  • XPath as a native language construct
  • Dynamic conversion between text and parsed representations of the XML.
  • XPath manipulation for XML modifications
While I like the goal, I wonder about the implementation. I'd rather be able to deal with XML elements and attributes being fully expressible -- That is I'd like to have XML element literals not just XML element variables (especially since variables have no type in Python, it's the values that do). I also wonder about the efficiency of using XPath for updates. The syntax that is presented for XPath modification implies that you have to execute the query before you can update the document -- there's nothing like a cursor to help you keep your place.

This topic is fresh on my mind, as I've just finished reading a series of papers on this very topic:

Martin Kempa and Volker Linnemann's PlanX '02 paper, On XML Objects describe XOBE a system implemented as a Java preprocessor, which aims to

eliminate the distinction between the string representation and the object representation of XML documents.
Their approach generates classes for each of the elements in an XML Grammer (DTD or Schema) and allows for object literals that look syntactically like XML. XOBE also allows XPath expressions for querying the resulting object hierarchies.

Erik Meijer and Wolfram Schulte's OOPSLA 2003 submission: Unifying Tables, Objects, and Documents takes a different approach. Meijer and Schulte show how to extend C# (it could just as easily be Java) to deal with relational and XML data. They set forth a number of design principles for their experimental language, but two of the most important are:

  1. Denotable values should be (easily) expressible
  2. Expressible values should be denotable
This is at the heart of not having to parse and/or re-serialize.

C# is extended to support streams of various lengths (this can be done easily in languages which have generators, like Python). It is also extended to allow tuples (heterogeneous structures of optionally labelled variables of fixed length). Python has unlabelled tuples built in. The combination of streams and tuples is used to model relational data. The last supporting extension is union types.

Streams, tuples, and unions are the unused to model a large part of the XML Schema (XSD) type system. This means that the programmer can declare classes that correspond to XSD types. C# is also extended with XML literals which can be stored in instances of the appropriate class

In the area of querying, Meijer and Schulte have departed from XPath syntax. Instead, the mechanisms for looping / querying are taken from functional programming: lifting, apply-to-all, and folds. This provides a nice mechanism that works for streams (relational data) and XML data. These mechanisms support accessing fields/members of objects to provide path expression / XPath-like navigation. The authors also provide wildcard, transitive, and type-based member access. Wildcard access returns all the members of a type (in declaration order), transitive access allows you to find a member that is transitively reachable from some other member. Type-based access allows you to restrict the type of the member being (transitively) searched for (the restriction is reminiscent of XPath axis notation).

In addition to functional style querying, this extension of C# can also support a SQL like select-from-where clause. The cool thing that they point out is that it doesn't matter where the queried data is: it can be in memory or on disk. This is the beauty of the stream abstraction.

The same authors (plus one), Erik Meijer, Wolfram Schulte, and Gavin Bierman have a paper in XML 2003 Programming with Circles, Triangles and Rectangles, which focuses on the XML aspects of the language described in the previous paper. The experimental language is now being called Xen. This paper goes into a lot more detail on the mismatch between the XML data model and object data models. There's less on streams and tuples (and the type rules), a little more on lifting, and examples of handling the XQuery use cases in Xen.

Ned Batchelder points out that the version linked above is encoded in MSIE only html. Bierman has made a friendlier version available (at least it renders in Mozilla).

It seems clear that everyone agrees that denotable values should be easily expressible and that expressible values should be denotable. What's not as clear is what the query model should be for a language that has XML support baked in. XPath is an obvious choice, but then you only get the functionality for XML data, and I'd like to have the same kind of query capability for any hierarchical data. That's what I like about the Xen approach. Of course, it is possible to make XPath work over objects, ala the ObjectXPathNavigator or JXPath...

I'm glad to see this work being published / discussed. It seems to me that Xen is most likely to make an appearance in commercial products, since Erik Meijer works in the WebData group at Microsoft, which is a product, not a research group. If you want to stay in the statically typed curly brace language world, it looks like Microsoft is kicking the tires in all the right places. If Xen becomes C#2006, then Java will be sucking wind (at least in my book).

[15:06] | [computers/programming/xml] | # | TB | F | G | 4 Comments | Other blogs commenting on this post
Eclipse Omnibus
Dave Johnson saw these screenshots of Whidbey and despaired of having such an easy to use environment in a Java based toolset. I don't know if Dave has seen the IBM WSAD version of the HTML/JSP editor, but it can do a lot of what the Whidbey HTML editor can do. If you select rendered HTML and switch to source view, it highlights the tags responsible. It works backwards that way too. Of course, it costs $$$, but so will Whidbey.

In other Eclipse news, codesugar is a new plugin that generates equals(), clone(), toString(), and hashCode() methods.

[11:51] | [computers/programming/java/eclipse] | # | TB | F | G | 1 Comments | Other blogs commenting on this post
Thu, 30 Oct 2003
Walter Smith has a blog
Walter Smith has a blog. Walter was responsible for lots of good ideas in Apple's Newton, and is now part of the Windows Client User Experience Team...
[23:37] | [computers/internet/weblogs] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Wed, 29 Oct 2003
More Fun Python projects
Jeremy Hylton posted some cool ideas for master's thesis level Python projects. Here's my suggestion:
  • Write a type annotator for Python programs that would take a Python program as input and produce a version of the program annotated with as detailed type information for as many variables and functions as possible. The goal is to provide a tool that could be used on Python files to give some confidence that type unsafe operations were not going to happen.
I also liked the suggestion for adding XML/SQL support ala Xen.
[23:46] | [computers/programming/python] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
The Essence of XML
Today I read Simeon and Wadler's POPL 2003 paper, The Essence of XML, which should really be titled "The Essence of XML Schema". The paper points out that XML is not a good data representation because it isn't self describing or round-trippable:
It is not always self-describing, since the internal format corresponding to an external XML description depends crucially on the XML Schema that is used for validation (for instance, to tell whether data is an integer or a string). And it is not always round-tripping, since some pathological Schemas lack this property (for instance, if there is a type union of integers and strings)
They are assuming a strongly typed world, of course. The key observation in their work is that using a named typing model instead of (more commonly used) a structural typing model leads to a model where proving theorems about validation and erasure is very easy. The model in the paper only models a subset of XML Schema, but the remaining features are accounted for in the formal semantics for XQuery and XPath 2.0.

For practitioners, the bottom line is that the subset of Schema described by the paper is roundtrippable (convert internal value to external and back -- output the value as XML and then parse/validate) except for simple types that are lists or unions. For the case of reverse-roundtripping (parse/validate then output), the only problems arise when a base type has multiple representations for the same value, like leading zeros or number bases.

[23:34] | [computers/programming/xml] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Tue, 28 Oct 2003
Big Ball of Mud
Foote and Yoder have a paper in PLoP 4 called Big Ball of Mud where they try to understand the forces that lead to the Big Ball of Mud (also known as spaghetti code) architecture, identify some patterns that lead to a Big Ball of Mud, and find ways to improve code that has become a Big Ball of Mud.

There are some interesting observations:

When it comes to software architecture, form follows function. Here we mean "follows" not in the traditional sense of dictating function. Instead, we mean that the distinct identities of the system's architectural elements often don't start to emerge until after the code is working.
From Brooks' 25th Anniversary Edition of The Mythical Man Month:
One always has, at every stage, in the process, a working system. I find that teams can grow much more complex entities in four months than they can build.
On layers:
Most interactions in a system tend to be within layers, or between adjacent layers. Individual layers tend to be about things that change at similar rates. Things that change at different rates diverge. Differential rates of change encourage layers to emerge.
Slowly evolving objects are bulwarks against change. They embody the wisdom that the system has accrued in its prior interactions with its environment. Like tenure, tradition, big corporations, and conservative politics, they maintain what has worked. They worked once, so they are kept around. They had a good idea once, so maybe they are a better than even bet to have another one.
[16:35] | [computers/programming] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
The Selfish Class
I've been reading Foote and Yoder's PLoP3 paper, The Selfish Class. The title is taken from Dawkin's The Selfish Gene, and the authors are trying to articulate the genes that allow software artifacts to be reused.

The genes / patterns that they identify are good indicators for successful open source projects. There's nothing surprising, but the paper puts a number of things together in one place.

[15:46] | [computers/open_source] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Mon, 27 Oct 2003
I spent some time today clearing my backup of Wired issues. In the August 2003 issue, there's an article on random number generation. The current technique uses a webcam (with a lens cap on) as a source of chaotic information -- the original used lava lamps. The inventors have developed an open source library for those who need really good random numbers.
[00:06] | [computers/programming] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Linkers and Loaders
I just finished reading John Levine's Linkers and Loaders. It's one of the few books out there on the linking and loading phases. It covers the principles of how linkers and loaders works, using ELF, MS PEF/COFF, and OS/360 as examples. There's a fair amount of detail concerning the specifics of these systems, which helps if you are working one of those environments.

This book has been sitting on my shelf (along with a bunch of others), so I decided to move it from one pile to another. The last chapter is a summary of research in linking and loading up til 2000, when the book was written. There's still a bunch of stuff that hasn't made it into the linkers that you and I use every day (perhaps with the exception of the JVM and CLR, but even then it seems like there's room for improvement).

[00:05] | [computers/operating_systems] | # | TB | F | G | 1 Comments | Other blogs commenting on this post
Sun, 26 Oct 2003
One apt to rule them all, One apt to bind them...
Ars Technica is reporting that Ian (the ian in Debian) Murdock's, Progeny has ported Red Hat's Anaconda installer to Debian. I was more interested in the news that Progeny is modifying apt to work with RPM packages.

If that weren't enough, the DebToo project is working to create tools to build Debian source packages using custom compile flags, just like Gentoo. In addition, Eric Wong has written APT-Fu, which can build packages using custom gcc flags.

Convergence is good.

[00:01] | [computers/operating_systems/linux/debian] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Sat, 25 Oct 2003
The death of computer hobbyists?
John Dvorak laments the decline of computer hobbyists. Like John, I got involved with computers as a hobby. I do think that his definition of hobbyist is a bit narrow. According to him, if you aren't collecting oddball hardware, then you aren't a hobbyist. I've always been a hobbyist on the software side -- I learned to program by typinging programs from Byte, Creating Computing, and Dr. Dobbs Journal, learning how they worked as I typed. Today, I believe that the opportunity for software hobbyists is larger than ever. Hobbyists that want to program computers have more avenues that allow them to contribute to software that will be used by real people. The same avenues provide lots of source code that hobbyists can use to learn from. I am referring of course, to open source software in its various forms.

It may be true that there's not much excitement for the hardware side of computers as a hobby, unless you like lighting up the insides of your case. But on the software side, there's plenty of opportunities.

[23:44] | [computers] | # | TB | F | G | 4 Comments | Other blogs commenting on this post
We need some inspiration
A few days ago, Jon Udell posted about Apple's Knowledge Navigator video. I won't repeat his analysis of how close we are to realizing the vision set forth in the video. But I find it interesting that 16 years later (it was first shown in 1987), it it still setting forth a compelling vision for what the computing experience should be like. The Linux people aren't driving towards something like this, and Longhorn is focusing on graphics, task orientation, and WS-* but for me, that pales in comparison to the Knowledge Navigator. Scoble should be holding this up as the future of the Tablet PC. But saddest of all, even Apple is no longer ostensibly working towards this vision.

I don't think that the Knowledge Navigator video is perfect or definitive, but at this moment in the history of computing, it can provide renewed inspiration.

[23:32] | [computers] | # | TB | F | G | 1 Comments | Other blogs commenting on this post
Fri, 24 Oct 2003
ApacheCon Wiki
I was hoping that ApacheCon was going to have a wiki, and the planners have not disappointed...
[16:12] | [computers/open_source/asf] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Thu, 23 Oct 2003
Geek tricks
Danny O'Brien is giving an ETCon talk on Tech Secrets of Overprolific Alpha Geeks. I love talks like this. Danny is starting his research via this QuickTopic thread.

There are already some interesting ideas there, such as coating the back of your laptop with velcro. I can see this being useful in some situations, but not others. My laptop backpack has a nice pocket for the laptop, and a nice carry bag for all the cables. But lots of times I want to just carry the thinkpad and one or two items. This is when the velcro would be useful. But then you'd have to peel everything off to get it back into the pack/bag.

[12:19] | [misc] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Object-Oriented Style (revised)
[via Lambda the Ultimate] Apparently Dan Friedman has revised his paper on Object-Oriented style. So if you want to see an example of Scheme macros in action, you should definitely look at this.
[11:43] | [computers/programming/lisp] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Wed, 22 Oct 2003
OSBC 2004 and r0ml
[via mod_pubsub] Here's yet another Open Source conference, this time focused on business. There's a nice list of speakers, but no topics...

One tidbit of interesting information from the site. Some of you will remember my OSCON notes and my admiration for Robert Lefkowitz. It appears that the uncertainty around his employment situation has settled. He's now "Director, Open Source, AT&T Wireless". Which makes me kind of wonder, what exactly are they doing with open source at AT&T Wireless? After all, they're my current cellular carrier...

[23:47] | [computers/open_source] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Eclipse is having its own convention. When you look at how many of the presentations are by IBM'ers, you have to wonder how successful Eclipse has been at attracting outside developers to the Eclipse core. Eclipse is definitely open source from a licensing point of view, but from a community point of view, the core still seems pretty closed. It takes time to build a community, and getting involved with such a big project is hard, so maybe I shouldn't be too hard on Eclipse just yet...
[22:46] | [computers/programming/java/eclipse] | # | TB | F | G | 1 Comments | Other blogs commenting on this post
UI Quickies
  • Jon Udell points out the work of Ben Bederson at UMD - there are many unconventional UI ideas here.
  • On the hardware side, PC Magazine tells us about the RoundPad
Seems like there might be some nice interactions.
[22:41] | [computers/programming] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
Redhat 9? Just say no.
I just read Don Park's evaluation of Red Hat 9. This jives with what I heard at last night's SeaJUG meeting, too. Mark Ashworth, the speaker last night, was running Suse, but he mentioned that more and more of his friends are running Debian. I run Debian for the package quality, not the politics, but it is true that there's no company behind Debian that will suddenly change the nature of the distribution.

The uptime on the machine that is {www,mail}.sauria.com is 271 days. It would have been longer, but I had to replace the UPS. This machine is regularly updated to Debian unstable. If you are less adventurous, you could set your apt sources list to use testing.

[00:35] | [computers/operating_systems/linux/debian] | # | TB | F | G | 2 Comments | Other blogs commenting on this post
Mon, 20 Oct 2003
Mail aggregators versus blog aggregators versus human aggregators
It seems that Dave Winer had a chat with the human aggregator (Scoble) and came away understanding why you'd like to integrate an aggregator with e-mail. Dave's two takeways were:
  1. Since it's integrated with email he can easily forward an item to people he works with via email.
  2. He has a folder where he drags items he wants to write about later.
Dave prefers Radio's blog like aggregator, which I've never used, but I understand that it puts all the aggregated stuff into a single HTML page, so that you can read it fast. I like that. That's why I use FeedDemon. It gives me that big HTML page so I can do the fast scan, and it takes care of takeaway #2 because I can create a news bin (or bins) to use as holding pens for items. Takeaway #1 would be a small matter of programming to implement, and I suppose that getting it really integrated to the mailer would be difficult.

I'm of the opinion that aggregated RSS data is a little different from mail or Usenet news, and so I want a different user interface for it. Certain kinds of e-mail integration is nice, but I also want integration with the rest of my PIM-space, which is possible with something like Newsgator. I'm pretty sure that Greg is hard at work on this -- too bad I hate Outlook.

Actually, I think it would be interesting to see a blog style style aggregator with a UI like this one. There's room for all kinds of stuff. The old PARC Perspective Wall or Cone Trees are some examples that easily come to mind as food for aggregator UI's. UI's are so stuck in the mud.

[22:37] | [computers/internet/microcontent] | # | TB | F | G | 2 Comments | Other blogs commenting on this post

twl JPG


Ted Leung FOAF Explorer

I work at the Open Source Applications Foundation (OSAF).
The opinions expressed here are entirely my own, not those of my employer.

Creative Commons License
This work is licensed under a Creative Commons License.

Now available!
Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
Technorati Profile
PGP Key Fingerprint
My del.icio.us Bookmarks
My Flickr Photos

RSS 2.0 xml GIF
Comments (RSS 2.0) xml GIF
Atom 0.3 feed
Feedburner'ed RSS feed

< October 2003 >
    1 2 3 4
5 6 7 8 91011


Macintosh Tips and Tricks

Blogs nearby
geourl PNG

/ (1567)
  books/ (33)
  computers/ (62)
    hardware/ (15)
    internet/ (58)
      mail/ (11)
      microcontent/ (58)
      weblogs/ (174)
        pyblosxom/ (36)
      www/ (25)
    open_source/ (145)
      asf/ (53)
      osaf/ (32)
        chandler/ (35)
        cosmo/ (1)
    operating_systems/ (16)
      linux/ (9)
        debian/ (15)
        ubuntu/ (2)
      macosx/ (101)
        tips/ (25)
      windows_xp/ (4)
    programming/ (156)
      clr/ (1)
      dotnet/ (13)
      java/ (71)
        eclipse/ (22)
      lisp/ (34)
      python/ (86)
      smalltalk/ (4)
      xml/ (18)
    research/ (1)
    security/ (4)
    wireless/ (1)
  culture/ (10)
    film/ (8)
    music/ (6)
  education/ (13)
  family/ (17)
  gadgets/ (24)
  misc/ (47)
  people/ (18)
  photography/ (25)
    pictures/ (12)
  places/ (3)
    us/ (0)
      wa/ (2)
        bainbridge_island/ (17)
        seattle/ (13)
  skating/ (6)
  society/ (20)

[Valid RSS]

del.icio.us linkblog



Listed on BlogShares

Locations of visitors to this page
Where are visitors to this page?

pyblosxom GIF