Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Tue, 13 Jan 2004
Atom, nuts, and bolts.
The debate over liberal parsing of Atom feeds is interesting. Of course, both sides have their points. The folks advocating liberal parsing believe that they are advocating for the users, in order to spare them the pain of bad feeds. The problem with this approach is that once you start accepting bad data, the data remains bad. This is particularly a problem when RSS/Atom data gets reformatted or repurposed. A great example of this is the RSS feed for Planet Lisp, which as Mark pointed out in the comments, doesn't validate. One of the problems with that feed is that it has a bunch of non-conforming dates in it. Poor Xach is just aggregating that feed data -- get got those dates from someone downstream. If he's using the Planet code, he's using Mark's liberal parser to do part of the dirty work.

So in this case, the liberal parsing camp says, look the feed is invalid, but the user doesn't care, just read it. Ok, we can do that. But look at why the feed is bad. It's a cascade of people who said exactly that. The users doen't care, but I bet Xach does, now that I passed Mark's feedback on to him. The users don't care about the pointy brackets, true. But they do care about the visible benefits of the pointy brackets. As more and more RSS data gets extracted, sliced, and diced, there are going to be more of these downstream accumulations of error. When you start reusing / extracting from this data, there will be visible benefits.

This bit me in Chandler in the performance tests, because we use DBXML as part of the implementation of the item repository. So we had a bunch of bad feed data that was blowing up the store. So before I used Mark's feedparser to get all the data out of the feeds, I first ran the feed data through a real XML parser and then threw out all the invalid feeds.

One way to look at XML compliance of specs is specifying the ability to pop things together. That is a benefit, and it is a user benefit, although the user that benefits may not be the user of standalone aggregators. Interoperability is important. If I go to the store for a 1/2 inch screw and they sell me a 5/8 inch screw because nuts "parse liberally", well, that's just not going to work.

Lots of folks are talking about this issue as if aggregators are the only users of RSS data, and I don't think that's going to be the case forever.

In the end, there's only one way to find out. Let's see what happens to the people who decide to parse strictly. If the liberal folks are right, then folks like Brent Simmons and Nick Bradbury will be out of business, because people will badmouth their strict aggregators and stop using/buying them. If not, a whole bunch of feeds will get cleaned up, and after a period of whining and tool building, the whole problem will be solved. If the authors of the major aggreators (and on my blog that's (roughly) NNW, SharpReader, FeedDemon, Radio Userland, feedreader, BottomFeeder, Straw, and RSSBandit) all decide to go strict, then I think that the feed producers can be brought into line pretty fast. If the feeds actually include good data about the feed producers, then strict aggregators could even auto-report bad feeds to the feed producer via email, along with a url to the feedvalidator like the one Mark posted for Planet Lisp. That's aggregator side tooling.

We also need author side tooling. My feed breaks periodically because I write entries by typing HTML into a very dumb CGI script. That's one reason I want to get something like Ecto or the NNW weblog editor working. And if those folks do their job properly, I'll never generate an invalid feed again.

So I say lets do an experiment. Brent and Nick are willing to bet their livelihood on it -- the rest of us are just talking.

[21:47] | [computers/internet/microcontent] | # | TB | F | G | 4 Comments | Other blogs commenting on this post
Back on the air
I always dread ISP transitions, because of the disruption and the lingering side effects. Today at midnight we went off the air. So much for Qwest's guidance that we'd probably be down for 2-3 hours while the circuit got switched. I stayed up to see if it would come back, but it didn't so I finally want to sleep. When I woke up at 8:30, the connection was still down. Time for a friendly call to the folks at Drizzle to find out what the status was. Apparently Qwest had switch the circuit, but my IP addresses had not yet been assigned and wouldn't be assigned until the right person came in around 11AM. Okay, not great, but encouraging. I was promised a call when the IP's were assigned. Good thing too, because the DSL modem was not in bridging mode, as I thought. So a quick run around to find the management cable, pull the router from the wiring closet (I have no portables that have an RS-232 port), and figure out how to talk to the management shell for the router. A 5 minute phone conversation with Drizzle took care of switching to bridging mode, and we were in business. Mail started pouring in, I updated the DNS at Network Solutions, and 2 hours later www came back on line. Now the rest is little cleanups as I find places where there were dependencies on the old IP addresses, and cleaning up along the way.

So to inaugurate the switch, we used a combination of iSight and telephone (iSight sound was kind of wonky, and we didn't want to experiment during the meeting) for me to attend a meeting at OSAF this afternoon. The little bit of video made a big difference.

[21:06] | [computers/internet/weblogs] | # | TB | F | G | 1 Comments | Other blogs commenting on this post


twl JPG

About

Ted Leung FOAF Explorer

I work at the Open Source Applications Foundation (OSAF).
The opinions expressed here are entirely my own, not those of my employer.

Creative Commons License
This work is licensed under a Creative Commons License.

Now available!
Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
Technorati Profile
PGP Key Fingerprint
My del.icio.us Bookmarks
My Flickr Photos


Syndicate
RSS 2.0 xml GIF
Comments (RSS 2.0) xml GIF
Atom 0.3 feed
Feedburner'ed RSS feed

< January 2004 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
25262728293031

Archives
2006
2005
2004
2003

Articles
Macintosh Tips and Tricks

Search
Lucene
Blogs nearby
geourl PNG

Categories
/ (1567)
  books/ (33)
  computers/ (62)
    hardware/ (15)
    internet/ (58)
      mail/ (11)
      microcontent/ (58)
      weblogs/ (174)
        pyblosxom/ (36)
      www/ (25)
    open_source/ (145)
      asf/ (53)
      osaf/ (32)
        chandler/ (35)
        cosmo/ (1)
    operating_systems/ (16)
      linux/ (9)
        debian/ (15)
        ubuntu/ (2)
      macosx/ (101)
        tips/ (25)
      windows_xp/ (4)
    programming/ (156)
      clr/ (1)
      dotnet/ (13)
      java/ (71)
        eclipse/ (22)
      lisp/ (34)
      python/ (86)
      smalltalk/ (4)
      xml/ (18)
    research/ (1)
    security/ (4)
    wireless/ (1)
  culture/ (10)
    film/ (8)
    music/ (6)
  education/ (13)
  family/ (17)
  gadgets/ (24)
  misc/ (47)
  people/ (18)
  photography/ (25)
    pictures/ (12)
  places/ (3)
    us/ (0)
      wa/ (2)
        bainbridge_island/ (17)
        seattle/ (13)
  skating/ (6)
  society/ (20)



[Valid RSS]

del.icio.us linkblog

www.flickr.com

Blogroll

java.blogs
Listed on BlogShares

Locations of visitors to this page
Where are visitors to this page?


pyblosxom GIF