Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Ted Leung on the air: Open Source, Java, Python, and ...
Thu, 30 Jun 2005
Fri, 28 May 2004
Tue, 23 Dec 2003
Julie just popped in to tell me that my sales rank on Amazon is 89,461. That's a significant improvement from 700,000+. Of course, the distribution of sales on Amazon is likely to be a power law shaped curve, so the sledding only gets tougher up hill. It's an odd experience shipping physical products. I finished writing the book in September and there've been little things to do here and there, but it's been mostly out of my mind since then because I've go so many other things going on. I've gotten so used to finishing something and having it go into use that it just feels weird to have this delay before the books go out. Even now it doesn't quite seem real yet, because I haven't seen a copy of the physical book yet. I hope that my copies are on the way. I had a similar feeling when I as working on the Newton at Apple. Talk about lag time. We had to finish our software much earlier than I was used to, because that software was going into ROM, and there was lead time for that, coupled with the vacation schedule of our production partner in Asia. Working on something physical, like a piece of electronics, or a book is definitely a different feel from just tagging a bunch of files in CVS, jarring them up and pronouncing it done. Of course, not all software is done that way, but more and more software is getting done that way.[00:05] | [computers/programming/xml] | # | TB | F | G | 0 Comments |
Thu, 18 Dec 2003
Fri, 31 Oct 2003
What is integrated XML support anyway?
Kimbro Staken is wishing for XML support baked directly into a programming language. His criteria are:[15:06] | [computers/programming/xml] | # | TB | F | G | 4 Comments |
- Seamless XML support. Never having to explicitly parse an XML document
- XPath as a native language construct
- Dynamic conversion between text and parsed representations of the XML.
- XPath manipulation for XML modifications
eliminate the distinction between the string representation and the object representation of XML documents.Their approach generates classes for each of the elements in an XML Grammer (DTD or Schema) and allows for object literals that look syntactically like XML. XOBE also allows XPath expressions for querying the resulting object hierarchies. Erik Meijer and Wolfram Schulte's OOPSLA 2003 submission: Unifying Tables, Objects, and Documents takes a different approach. Meijer and Schulte show how to extend C# (it could just as easily be Java) to deal with relational and XML data. They set forth a number of design principles for their experimental language, but two of the most important are:
- Denotable values should be (easily) expressible
- Expressible values should be denotable
Wed, 29 Oct 2003
The Essence of XML
Today I read Simeon and Wadler's POPL 2003 paper, The Essence of XML, which should really be titled "The Essence of XML Schema". The paper points out that XML is not a good data representation because it isn't self describing or round-trippable:[23:34] | [computers/programming/xml] | # | TB | F | G | 0 Comments |
It is not always self-describing, since the internal format corresponding to an external XML description depends crucially on the XML Schema that is used for validation (for instance, to tell whether data is an integer or a string). And it is not always round-tripping, since some pathological Schemas lack this property (for instance, if there is a type union of integers and strings)They are assuming a strongly typed world, of course. The key observation in their work is that using a named typing model instead of (more commonly used) a structural typing model leads to a model where proving theorems about validation and erasure is very easy. The model in the paper only models a subset of XML Schema, but the remaining features are accounted for in the formal semantics for XQuery and XPath 2.0. For practitioners, the bottom line is that the subset of Schema described by the paper is roundtrippable (convert internal value to external and back -- output the value as XML and then parse/validate) except for simple types that are lists or unions. For the case of reverse-roundtripping (parse/validate then output), the only problems arise when a base type has multiple representations for the same value, like leading zeros or number bases.
Fri, 03 Oct 2003
Fri, 19 Sep 2003
Push me pull you
Diego's done a little benchmarking of Crimson (the parser in the JDK) and the StAX RI. StAX comes out way ahead, but there are still some things to wonder about: Can StAX validate? Crimson can, and is likely taking a hit for it. If he compared it to Xerces-J, I'd have to ask whether StAX can validate schema? One interesting comparison would be against NekoPull. I'm excited about StAX, but having worked on Xerces-J, I also know that there are lots of factors that could be influencing the numbers. The thing that is clear, is that the code is a lot simpler. The performance stuff is just repeated application of solid engineering.[12:39] | [computers/programming/xml] | # | TB | F | G | 0 Comments |
Thu, 18 Sep 2003
StAX + XMLBeans == Java XML APIs
Eliotte Rusty Harold has an XML.com article on StAX. It's a good introduction to this style of API. He didn't like the use of integer type codes, preferring to have objects. I view StAX as a replacement for SAX, which is basically the bottom of the stack for XML processing API's. You can use SAX or StAX to build trees or beans or whatever you like. So any performance penalty that you impose on StAX is something you've imposed on everybody above you in the food chain, and there's no way to get that performance back if you need it. I think that they did the right thing here. The performance of reflection is never going to match integer comparison. Also, Andy Clark, the author of NekoPull is a member of the expert group as well. As noted, the state management is much easier than SAX -- it allows you to write a more recursive-descent style of object construction. One of the problems with SAX is that it's very hard to use it to create objects in a way that allows you to reuse the objects and the SAX handler for building those objects. StAX makes that go away. In many cases what you want is an easy to manipulate representation of the XML data. This is very frequently a Java Bean. So what you really want is an API that takes an XML stream, and gives me back the right Java Bean. And that's what XMLBeans is all about. So in my view of the Java XML API world, there's StAX at the bottom, and on top of that you have XMLBeans. If you need to do weird document editing tree stuff, then there's StAX at the bottom, and on top of that there's the DOM or XOM or JDOM or whatever. But for most applications that I've worked on, the combination of StAX and XMLBeans would cover it. If you look at the XMLBeans Roadmap, you'll see that one of the goals is to support StAX (JSR 173), which would mean one stop shopping, er, downloading. My hope is that as XMLBeans evolves we'll be able to have a simple to use library.[01:06] | [computers/programming/xml] | # | TB | F | G | 2 Comments |
Thu, 11 Sep 2003
XML Tools from all over[01:03] | [computers/programming/xml] | # | TB | F | G | 0 Comments |
- Uche Ogbuji's XML.com article on The State of the Python-XML Art, 2003 is good guide for people unfamiliar with Python's XML support, like me. I found some good pointers in there.
- Micah Dubinko also has an article on XML.com, discussing Ten XForms Engines. I'd really like to see a Firebird plugin for this. And one that could generate a SVG / Flash .swf file given an XForms document.
- From XMLhack, comes news of James Clark's GNU Emacs mode for XML nXML. nXML can do validation based on Relax NG schemas.
Tue, 09 Sep 2003
This CNET article is ostensibly about Cisco's color VOIP phone, but tucked in at then end is a bit of interesting information:[01:32] | [computers/programming/xml] | # | TB | F | G | 2 Comments |
Like most VoIP phones, the 7970G will also perform Internet functions such as searching a database for work orders. It does so by running programs written in an emerging software language called XML (Extensible Markup Language) that's used to share data over networks. Also Tuesday, Cisco said it will extend XML capabilities to its 7905G and 7912G VoIP phones by the end of the year. The phones are available now, selling for $135 and $165 respectively.Does this have possibilities?
Mon, 01 Sep 2003
Tue, 19 Aug 2003
XML omnibus: demystification, RelaxNG, Postel's law[01:33] | [computers/programming/xml] | # | TB | F | G | 0 Comments |
- You know we're in trouble when we need an XML Acronym Demystifier. Things are really getting out of hand.
- Tim Bray is extolling RelaxNG after his experiences writing a schema for Pie/Echo/Atom/Whatever. I agree that RelaxNG is technically superior to XML Schema. I've all but given up on RelaxNG supplanting XML Schema as the majority schema language for XML. I suppose that's mostly because I'm looking at it through the lens of Xerces and the disinterest we've had from RelaxNG knowledgeable folks in the past. Something needs to happen for RelaxNG to pick up momentum. Right now, it's just another one of those acronyms that we need the demystifier for. Until the RelaxNG community starts showing up to projects and helping to build RelaxNG support, RelaxNG will remain obscure. A few years ago I tried to use the leverage that we had with Xerces-J to try and advance RelaxNG, but no one was interested enough to help us. If the RelaxNG folks don't care enough to get RelaxNG support into the most popular processors in all the major languages, then they can't expect the rest of us to care.
- I partially disagree with Aaron Swartz and Mark Pilgrim (and probably a host of others) on Postel's law. All the examples that Aaron gives are related to documents that users might process (this includes RSS feeds, where I've been bitten by his reason #1). I agree that in those situations, it's desirable for the parser to be Postel compliant (i.e. lenient). But in situations where the XML is generated for machines, by machines, I think that the XML spec writers were correct. Let them eat fatalErrors, because there's a bug in a program somewhere that needs to get fixed. One of the motivations for the Xerces Native Interface was to allow us to build families of parsers by replacing pieces of the framework. So you could implement a strict XML parser (which is what XMLDocumentScannerImpl is) or you could implement a Postel XML parser. Only problem is that nobody cares enough to implement a Postel XML parser.
Sat, 16 Aug 2003
My current project unveiled...
I've been hinting at my current project here and there. Today my agent told me that I am clear to talk about it, so since a picture is worth a thousand words:[00:49] | [computers/programming/xml] | # | TB | F | G | 3 Comments |
The goal of this book is to help people use the unique features of various XML related projects at the ASF, mostly those that were a part of xml.apache.org at the beginning of the year. I'm pretty well along, but if there's something that you always wanted to know about some Apache XML tool, drop me a note, comment, or trackback, and I'll see what I can do.
Thus far, writing a book has turned out to be a bigger experience than I thought, but so far I'm enjoying it. I was really able to identify with Erik Hatcher's post today.
Fri, 04 Jul 2003
XMLBeans going open source
Steven's already posted his opinion on XMLBeans moving to Apache. He's made himself an ASF sponsor of the proposal to bring XMLBeans to the ASF. Steven's post focused on the political end of the proposal. Most of his reasoning would be subsumed by my repeated postings here, so I won't repeat myself. I'm also interested in seeing XMLBeans go open source, even if it ultimately doesn't end up at the ASF. The reason is simple. Most of the API's that we have for dealing with XML suck. I think that we can do a lot better, and XMLBeans looks pretty good. The open source community needs to start thinking about targeted areas for improvement. The whole XML API area is ripe for this. Having XMLBeans go open source is much better than having it go to the JCP. I'm planning to try and help shepherd this through the Apache incubator.[22:31] | [computers/programming/xml] | # | TB | F | G | 1 Comments |
Mon, 30 Jun 2003
Sun, 22 Jun 2003
Sat, 07 Jun 2003
What does equivalence mean for XML?
Dare is back and writing like a madman. There's one aspect of comparing XML that I'm not sure he quite picked up, though[14:15] | [computers/programming/xml] | # | TB | F | G | 2 Comments |
If I get arbitrary XML as input to my method and want to compare whether the XML fragment is equivalent to another XML fragment there isn't an easy way to do this without converting both in-memory representations to their string representations (if they aren't strings already) and doing a string comparison. Hmmmm...String comparison is not nearly enough to determine equivalance. Aside from the namespace prefix mapping problem mentioned by Harry, there are problems related to order of attributes, whitespace between elements, and empty elements, to name a few. The Canonical XML recommendation has a good list of the issues. You have to decide what equivalence means when you are talking about XML derived data. If you want equivalence to mean "the objects that have been serialized as these two XML documents are equal in C#/Java", then you have more work to do at the "string comparison" level.