Ted Leung on the air: Open Source, Java, Python, and ...
Some of my pictures from Chinese New Year wound up on the NowPublic site. This happened because I tagged the photos with 'chinesenewyear2006'. This is the first time (that I am aware of) that someone has picked up any of my photos because I tagged them. A good incentive to keep on tagging my pictures even though I personally don't use the tags that much.
Flickr is commonly held up as one of poster children for tagging. Now I like tagging quite a lot (see my del.icio.us if you don't believe me), but for me, tagging is not really the reason that I use Flickr. I use Flickr as a way to "force" myself to think about photography every day. The way that I do this is by subscribing to Flickr content via RSS, and you can subscribe to just about anything on Flickr using RSS, except for say, the interesting photos (cough, cough). I'm not really interested in topical slices (regardless of who did the slicing) of photostreams, which is what tagging gives you (although I do appreciate those who have tagged their pictures with the particular lens that they used to make the picture). About the only time is use tags is to search for conference pictures. I'm taken in by particular people -- their style, their subjects, their settings. It's all about the people in Flickr.
Which brings me to an annoying thing about Flickr. You can add people to your contacts list and then subscribe to a single feed of pictures from the people in your contacts list. The problem is that the RSS feed only gives you the single most recent picture from each person. That just doesn't make it for me, so the other day, I went through and subscribed to the individual photostream feeds from people in my contacts list. There ought to be a setting somewhere...
I have an affinity for long (and sometimes long-winded) science-fiction and fantasy books. A few days ago, I accidentally discovered that the 11th book in Robert Jordan's Wheel of Time series, Knife of Dreams, was released. I would love it if there were a way to get an RSS feed that contained announcements of new Robert Jordan titles. Bonus points for a 1-click kind of interface that takes me to my favorite library or bookseller when a new entry hits the feed. And we're at it, do the same thing for musical artists and movies. If content producers want to sell it, they need to tell me about it, and RSS would be a lot more direct than TV.
The other end of this is that I use the local public library a lot. Jon Udell's library lookup bookmarklets are useful for checking to see if books are in the library, but I would love to have support for the entire "discover a book, get it from the library, return it" lifecycle. A web service that let me access my library account information would be great.
On the first night of Foo Camp this past summer, I was wandering. Julie had decided to go to sleep, since she was talking the next morning. Night owl that I am, I was in no way prepared to go to sleep. So I headed back down to the common areas, looking for, well, I wasn't exactly sure. I ran into a number of people along the way and by the time I got back to the ground floor, it was fairly late.
As I came out of the building I saw a fellow with a big digital SLR taking flash photographs of the boards containing the Foo Camp schedule. Being fascinated, or rather, intimidated by flash photography, I walked a bit closer and asked "Did they come out?". The fellow and I started talking, and it wasn't long before I discovered that I was talking to Stewart Butterfield, one of the founders of Flickr. After I realized this, I waited for an appropriate pause in our conversation. I held up my (still pretty new) Canon Digital Rebel XT and said something like "I don't know whether to thank you or to blame you". Which then took us off onto a different vector of conversation.
One thing I do know is that Flickr truly was instrumental in reigniting my interest in photography to the point where I went from a non-pro account to a pro account, from sharing an economical point and shoot to wanting my own digital SLR, and from taking an occasional picture to hauling that camera just about everywhere. I've become a passionate user. You start spending money. Cameras, lenses, tripods, books, prints, Aperture, Photoshop, etc. Somebody is going to be making a ton of money off the spark that Flickr helped light. Except that Flickr isn't going to see any of that money. Most of it's going to go to Canon, Bogen, Apple, whoever. Maybe Flickr should open a photo equipment store or some kind of affinity program.
Anil Dash and Caterina Fake are having a discussion about whether or not companies like Flickr should be paying the users that put their content up there. It's an interesting discussion, to look at it that way. But it does seem a little strange. I pay (paid) Flickr for a pro account so I could put my content up there (Unfortunately, I'm in no danger of generating enough traffic to get paid for), so it seems a little odd to me to expect that I would then get paid if I generated a certain amount of traffic. But maybe I'm just not thinking straight about all of that. At least for now, I feel that I've gotten quite a bit more than my $29 worth of value out of Flickr, whether I get a reward for traffic or not.
A few people have asked if I could post the Python scripts that I am using to generate RSS feeds for system statistics.
Here they are. In addition to Python, you need eGenix's mxTidy package, as well as the Python wrappers for libxml2. There are also a few hardcoded paths in the scripts which you will need to change to match your installations of awstats, mrtg, and so forth. After you have done this, you can subscribe to the scripts from NetNewsWire (the awstats script allows you to pass the awstats configuration as a parameter).
These scripts fall into the quick and dirty hack category. They work well enough for me. If you use them and improve them, I'd appreciate a note with your changes.
Various people have been discussing the "attention problem" and attention.xml. The basic idea is that the advent of RSS means that we have too much information competing for our attention, and that we need a way to record "attention" data so which could then be used to perform triage of information to be presented to a human user.
The notion of automated triage of information has been around for a while. To an old Usenet junkie like me, the information overload problem and the need for tools to help triage the flow seems like a no brainer. I remember when s(coring)trn newsreader came out (as a set of patches to trn), and when Gnus, the mother of all news readers made its debut. Gnus was one of the first platforms for collaborative filtering, GroupLens, which then went on to become NetPerceptions (which now appears to be defunct). A lot of what is being discussed feels familiar, in concept if not in actual implementation. So what's new here?
Well, strn and Gnus were not instrumented to record attention data. Although I have some doubts about the accuracy of some of the data (like how long did a user read a post), Steve Gillmor seems quite excited about instrumenting clients (like Firefox) in order to obtain this data. Assuming that you could gather meaningful (or mostly meaningful) data, this seems like a good source of data for triage.
Next, you have the notion of distributing / combining / syndicating / bartering / selling the attention data. If you want to do this, having a standardized format for encoding that data seems good, and since adding XML to data always makes things better, you get attention.xml. This part is easy, and most people agree on that. There's a question about where attention data lives and who gets access to it. This is an important question, at least to me. If people are unscrupulous about the (comparatively) small amount of information that I give out about myself, what will happen when they could get their hands on a detailed model of my attention? The thought of storing my attention data in somebody's VC backed server farm doesn't give me the warm fuzzies.
After that, we get to actually using the attention data to perform triage, which is where there is room for experimentation, variation, and market based competition (at least if you believe in exposing your attention stream). Here's where you get into scoring, bayesian filtering, collaborative filtering, reputational filtering and so forth. It's also where you have to deal with issues of granularity, i.e. single posts versus conversations. It's also where you get into potentially innovative presentations as well.
From where I sit, my attention data, my reputation assignments, my triage preferences and so forth are all part of my personal information, and would be something that I would like a personal information manager to manage. I think we have one of those lying around here somewhere...
Scoble asked me to write this post, so here goes. I don't mean that RSS aggregators are the kind of killer app that sells a billion computers and creates new markets (there is that possibility, though). I mean the app that does so much that it consumes all available CPU, memory, network, and disk. Perhaps I really mean that they're the "killing my computer" app.
If I take out software development activities, the application that is pushing the limits of my hardware is my RSS aggregator. This is not in any way a slam on NetNewsWire, which is very, very, fine application. It's a reflection of the way that my relationship to the web has changed. I hardly use a standalone browser anymore -- mostly for searching or printing. I don't have time to go and visit all the web sites that have information that is useful to me. Fortunately, the aggregator takes care of that. Once the aggregator has the information, I want it to fold, spindle, and mutilate it. I'm at over 1000 feeds, and on an average day, it's not uncommon to have 4000 new items flow through the aggregator. It takes 25 minutes (spread out over two sessions) just to pull the data down and process it -- and I have a very fast connection. NetNewswire uses WebKit to render HTML in line -- a feature that makes it easy to cut through piles of entries, but one which is demanding of CPU and memory.
But that's just the basics. What happens when we start doing Bayesian stuff on 4000 items a day? Latent Semantic Indexing? Clustering? Reinforcement Learning? Oh, and I want to do all of those things on all the stuff that I ever pulled down, not just the new stuff. What happens if I want to build a "real-time" trend analyzer using RSS feed data as the input? The processor vendors should be licking their chops...
As Julie has already mentioned, tomorrow we'll be heading over to the Eastside blogger meetup. Perhaps Scoble and I can sit down and talk about Information Overload. I've been pondering this topic since he posted on his BloggerCon session of the same name. Philip Greenspun posted what is essentially the reverse opinion, that reading the news is of little benefit. It's an interesting question. I definitely tend towards keeping up and having lots of information pass through my brain. RSS Aggregators provide the machinery that allows lots of content to pass into my brain. Weblogs provide a source of good quality information. The combination is a treat for infovores.
There are many interesting discussions that one can have about using technology to increase the signal to noise ratio of one's personal RSS cloud. All of those conversations are predicated on the notion that higher quantities of higher quality information is good. But is that really true?
I've been watching the whole iPodder/podcasting thing with interest. On the one hand, the jump from text to rich media is obvious. On the other hand, the impact on information overload is tremendous. I can't watch a Channel9 video and be cutting through my RSS feeds at the same time. Even the audio turns out to be distracting for me, which means that podcasting isn't going to help much either. Audio and video take advantage of temporal relationships in the audio and video. You can only accelerate that a small amount before the communication is garbled. Try to compute the number of hours during a week that are potentially available for information grazing. Then add up the number of audio/video posts that you might want to consume and face reality. Podcasting (and it's obvious and imminent followon, video podcasting) are a way of providing on demand delivery of audio/video (just not controlled by big media). In some ways that's going to turn out to be pretty powerful. (I can imagine my ice skating channel coming to life if venue attendees are allowed to do DV recording and rebroadcast of events)
Hmm, that personal RSS cloud just became a personal media cloud. But it still won't be as efficient as text. And there's still the question of whether this is a good thing.
Early Tuesday, Mark Fletcher posted the news on Bloglines' new REST based web services API. I'm glad to see that NetNewsWire and FeedDemon, the RSS aggregators of choice in our house, among the initial supporters. Marc Hedlund has an article showing a Groovy implementation of RSS aggregator that leverages the Bloglines services. The initial support in the API is for notification, syncing and getting a blogroll.
A rich client like NetNewsWire/FeedDemon integrated with a web based service like Bloglines has the potential to offer users the best of both worlds, and I believe this is a glimpse of the world yet to come. While I've looked at Bloglines and seen some features that I like (most of which related to social aspects like recommending feeds that I'd like), I've never considered using it seriously because it won't work for offline mode, and because I believe that RSS aggregators are going to talk to my mail, address book, calendar, and other applications running on my machine. If Bloglines services exposed that social information, then I could have my rich client that integrated with local services, and still get the benefits of the Bloglines service -- that means being able to leverage the social information harvested by Bloglines within NetNewsWire. The API doesn't support it today, but there's no reason that it couldn't.
This general architecture applies to situations other than RSS aggregation, of course. This sort of thing is what I envisioned when I read Tim O'Reilly's essay All Software Should be Network Aware.
The wired world is slowly absorbing pieces of me. There's the weblog, where I write prose, del.icio.us has my bookmarks, and now Flickr has got my pictures. Never mind the social networking sites. And the feeds, the RSS and Atom feeds. The blog feed, the category feeds, the comment feed. All the del.icio.us feeds. Flickr feeds from friends. Feeds, feeds, feeds. Oh, and don't forget to feedburner your feeds into one mega feed. I have microcontent personality disorder. I won't even start on multiple e-mail, IM, and IRC personality disorder -- I need a whole display just for communications!