Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Wed, 29 Oct 2003
More Fun Python projects
Jeremy Hylton posted some cool ideas for master's thesis level Python projects. Here's my suggestion:
  • Write a type annotator for Python programs that would take a Python program as input and produce a version of the program annotated with as detailed type information for as many variables and functions as possible. The goal is to provide a tool that could be used on Python files to give some confidence that type unsafe operations were not going to happen.
I also liked the suggestion for adding XML/SQL support ala Xen.
[23:46] | [computers/programming/python] | # | TB | F | G | 0 Comments | Other blogs commenting on this post
The Essence of XML
Today I read Simeon and Wadler's POPL 2003 paper, The Essence of XML, which should really be titled "The Essence of XML Schema". The paper points out that XML is not a good data representation because it isn't self describing or round-trippable:
It is not always self-describing, since the internal format corresponding to an external XML description depends crucially on the XML Schema that is used for validation (for instance, to tell whether data is an integer or a string). And it is not always round-tripping, since some pathological Schemas lack this property (for instance, if there is a type union of integers and strings)
They are assuming a strongly typed world, of course. The key observation in their work is that using a named typing model instead of (more commonly used) a structural typing model leads to a model where proving theorems about validation and erasure is very easy. The model in the paper only models a subset of XML Schema, but the remaining features are accounted for in the formal semantics for XQuery and XPath 2.0.

For practitioners, the bottom line is that the subset of Schema described by the paper is roundtrippable (convert internal value to external and back -- output the value as XML and then parse/validate) except for simple types that are lists or unions. For the case of reverse-roundtripping (parse/validate then output), the only problems arise when a base type has multiple representations for the same value, like leading zeros or number bases.

[23:34] | [computers/programming/xml] | # | TB | F | G | 0 Comments | Other blogs commenting on this post

twl JPG


Ted Leung FOAF Explorer

I work at the Open Source Applications Foundation (OSAF).
The opinions expressed here are entirely my own, not those of my employer.

Creative Commons License
This work is licensed under a Creative Commons License.

Now available!
Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
Technorati Profile
PGP Key Fingerprint
My del.icio.us Bookmarks
My Flickr Photos

RSS 2.0 xml GIF
Comments (RSS 2.0) xml GIF
Atom 0.3 feed
Feedburner'ed RSS feed

< October 2003 >
    1 2 3 4
5 6 7 8 91011


Macintosh Tips and Tricks

Blogs nearby
geourl PNG

/ (1567)
  books/ (33)
  computers/ (62)
    hardware/ (15)
    internet/ (58)
      mail/ (11)
      microcontent/ (58)
      weblogs/ (174)
        pyblosxom/ (36)
      www/ (25)
    open_source/ (145)
      asf/ (53)
      osaf/ (32)
        chandler/ (35)
        cosmo/ (1)
    operating_systems/ (16)
      linux/ (9)
        debian/ (15)
        ubuntu/ (2)
      macosx/ (101)
        tips/ (25)
      windows_xp/ (4)
    programming/ (156)
      clr/ (1)
      dotnet/ (13)
      java/ (71)
        eclipse/ (22)
      lisp/ (34)
      python/ (86)
      smalltalk/ (4)
      xml/ (18)
    research/ (1)
    security/ (4)
    wireless/ (1)
  culture/ (10)
    film/ (8)
    music/ (6)
  education/ (13)
  family/ (17)
  gadgets/ (24)
  misc/ (47)
  people/ (18)
  photography/ (25)
    pictures/ (12)
  places/ (3)
    us/ (0)
      wa/ (2)
        bainbridge_island/ (17)
        seattle/ (13)
  skating/ (6)
  society/ (20)

[Valid RSS]

del.icio.us linkblog



Listed on BlogShares

Locations of visitors to this page
Where are visitors to this page?

pyblosxom GIF