Monthly Archives: May 2009

On Twitter Data

I’ve been getting various kinds of private communication about this, so it’s probably worth some commentary…

For some time now, I’ve been wondering when someone would start to use systems like Twitter as a way to deliver information between programs. A few weeks ago, Todd Fast, a colleague at Sun gave me a preview of what is now the Twitter Data proposal. Todd and Jiri Kopsa have done all the heavy lifting on this, so if you have substantive comments or requests, they are really the people you should be dealing with. They were kind enough to recognize me as a reviewer of their work, but the initial idea is theirs.

Twitter Data is a bit different than what I was envisioning. I was thinking more along the lines of jamming JSON or XML data into a Twitter message as a starting point for program level data exchange. That would allow us to leverage existing tools and libraries and make the entire thing straight forward. The interesting part, then, would be in the distribution network that arose from programs following other programs. This could also be embedded into a person’s Twitter feed by allowing clients to ignore tweet payloads that were structured data.

Twitter Data proposes a way to annotate the data oriented parts of a regular Tweet in order to make it easier for machines to extract the data. Some people think this is a good idea, and some people think it’s a terrible idea. It’s easy to see the arguments on both sides. Pro, is that you could turn your Tweet stream into a way to deliver information about you to programs, and that Twitter Data would make it that much easier to do. The Cons (that I’ve seen so far) are that people don’t want to have this kind of data exchange mixed into their Twitter stream, or that parsing the natural language that appears in the 140 characters of a tweet shouldn’t be that hard.

So we have two dimensions (at least) to the problem that Twitter Data is trying to address:

  1. Is it a useful thing to have structured or semi structured information about a person included in their Twitter feed?
  2. If so, should that data be out of band, mixed in, or extracted (natural language processing)?

Independent of the merits of the specific Twittter Data proposal (and I definitely think that there are merits), I think that these two questions are worth some discussion and pondering.

Mac Pro time

For the past three or four years, I’ve been promising myself that I was going to buy myself a Mac Pro. This mostly a result of digital photography, which makes rapacious demands on computer systems. In the last 9 months or so, it’s also been because I am doing more work using virtualized machine images. In any case, every time Apple had an event, I was telling myself that I was going to buy the machine, but there was always some reason why it never happened. The announcement of the Nehalem based Mac Pro earlier this year finally pushed me over the edge. And pushing was required. There’s been a lot of benchmarking which casts the performance of these machines in questionable light when compared with the machines that they replaced. Until a bunch of applications are rewritten to take advantage of the large number of cores in Nehalem based systems, these boxes are only slightly better than the ones they replaced, and a bit more expensive.

I ended up getting an 8 core machine, because these are the machines that can be expanded to an outrageous amount of memory, something which is a necessity for systems doing a lot of Photoshop. Due to the benchmarking controversy, I got the 2.66GHz processors, so that single threaded programs wouldn’t suffer as much. Here’s a quick rundown on my experience after having the machine for a few weeks.

Hardware

All of my hardware moved over without a hiccup, except for my Logitech Z-5500 speakers. I needed a TOSLINK to TOSLINK cable, which was rectified by a trip to Radio Shack (yes, we have one on Bainbridge Island. It’s not Fry’s but once a year or so they save my bacon.). The machine is much quieter than I expected. The last desktop machine that I owned was a homebuilt Windows box, and that thing was really loud. The Mac Pro is quieter than some of the external FireWire drives that are plugged into it. Heat would be a different story. My office is already several degrees warmer than the rest of the house, and now it’s probably another several degrees warmer. I’m having to be very careful about leaving my office doors open in order for things to cool down. Figuring out how this works in the summer is going to be interesting.

Performance wise I am pretty happy. Things are definitely snappier than my Sun supplied 2.6GHz MacBook Pro. I moved some external disks off of Firewire and into the Mac Pro’s internal SATA drive bays, and I am sure that the change in interface made a big contribution to the improved speed. The machine has 12GB of Other World Computing RAM in it, so it basically doesn’t page unless I am doing something big in Photoshop or have several VirtualBox VMs open at the same time.

There are some things that I miss:

We don’t have TV, so we do a lot of NetFlix and other DVD’s. This happened mostly on the MacBook Pro via Front Row and the Apple Remote. The Mac Pro doesn’t talk to the Apple remote, and I miss that. If people have suggestions for controlling Front Row on a Mac Pro, please leave them in the comments.

I got used to having the laptop hooked up to the LCD display, and using the laptop LCD as my “communications display” for IM, IRC, Twitter and so forth. Now I’m back down to a single display and missing it. I’m also missing it in Lightroom.

The Mac Pro came with an Apple keyboard, and the keyboard I was using was a Microsoft Natural Keyboard from 2000, and some of the keys were starting to get hard to push. So I figured that I would try the Apple keyboard. So far I don’t mind it, but keys are in different places, and the new keyboard has 9 years of muscle memory working against it. But that would be true of just about any keyboard.

Software

Any time I get a new machine I update my Macintosh Tips and Tricks page. I definitely have some updates that I could make, and I might make some of them after JavaOne. The rumor mill is suggesting that MacOS 10.6 Snow Leopard is going to ship this summer, so I might just wait until that happens, since I expect a lot of things to need updating, rearranging, etc.

I did have a problem when I tried to update the machine to 10.5.7. Things were behaving very oddly, so I restored the machine back to 10.5.6 with Time Machine. Time Machine backups on an internal SATA drive take less time (and make less noise) than on an external FireWire drive. I’m going to give this another try after JavaOne. And for prospective commenters, yes, I repaired permissions and used the Combo Updater.

Photoshop occasionally makes use of the additional cores, but it’s the large amount of RAM that is really making the difference at the moment. The same is true for Lightroom. Perhaps the next editions of these programs, coupled with 10.6, will do a better job of keeping multiple cores busy. In the meantime, my Lightroom to Photoshop batch jobs are definitely running quite a bit faster than before.

On the whole

On the whole, I am happy with the machine, and I expect to be a lot happier when 10.6 ships this summer.

Erlang Factory 2009

I spent Thursday and Friday of last week at the Erlang Factory in San Francisco (although the event was actually in Palo Alto).

Why did I go?

I’ve written about Erlang in this space before. Erlang is having a major influence on other languages, such as Scala on the JVM side and Axum on the CLR side. In addition every language seems to have several implementation of Erlang style “actors” (despite the fact that this is historically incorrect). Erlang has been around for a long time, and has seen industrial usage in demanding telecom applications. As a dynamically typed functional language with good support for concurrency and distribution, it is (if nothing else) a source of interesting ideas. Earlier this year, my boss asked me to start doing some thinking about cloud computing in addition to the stuff that I was already doing around dynamic languages — another good match for Erlang. This was the first large scale gathering of Erlang people in the US (at least that I am aware of), so I wanted to drop in and see what was going on, what the community is like, and so on.

Talks

The program at the Erlang Factory was very strong. In many of the slot sessions, there were 3 excellent talks to choose from. Every single talk that I went to was of very high quality. It was so bad that I wasn’t able to explore all the areas that I wanted to. Fortunately, the sessions were videotaped and are supposed to be made available on the web. Also, there was a decent amount of twittering going on, so a Twitter search for #erlangfactory will turn up some useful information.

I attended a number of “experience” talks by companies / individuals. There were experience talks from Facebook, SAP, Orbitz, and Kreditor (the fastest growing company in Sweden). I made it to the Facebook talk and the Kreditor talk. Facebook’s usage/deployment is on the order of 100 machines, which provide the chat facility for Facebook. Erlang is doing all the heavy lifting, and PHP is doing the web UI part. There was a lot of this kind of architecture floating around the conference. It seemed like the most popular combination was Ruby/Erlang, but there was definitely Python and PHP as well. The Kreditor talk was interesting because their site has been running for 3 years with very small amounts of downtime. Unfortunately, their entire deployment is probably less than 10 machines, so that blunts the impressiveness of what they have done. Still it was interesting to hear how they accomplished this using features of Erlang. In addition to the talks, I spoke with many attendees who are using Erlang in their companies. One such person was eBay founder Pierre Omidyar, who is running Ginx, a web based Twitter client. Pierre is doing the coding and deployment of the site, and was well versed in the Erlang way of doing things. An interesting data point.

The Erlang community (like all communities) has it’s old guard. These are folks who have worked with Erlang for years, before its recent burst of interest. There were a pair of keynotes by Erlang long-timers Robert Virding (The Erlang Rationale) and Ulf Wiger (Mulitcore Programming in Erlang). Both of these talks shared a common trait — the speakers were pretty honest about what was good about Erlang, and where there were problems. Given how prone the computing business is to fashion, I found this to be refreshing. Virding talked about the reasons why Erlang is designed the way it is. He accepted the blame for inconstencies in the libraries, talked about the need to avoid the process dictionary, and agreed that “a char type is probably not wrong”. Wiger’s talk was about why parallelizing code is hard (even with Erlang). He used the example of parallelizing map to demonstrate this, and showed the use of the QuickCheck testing tool to aid in finding parallelism bugs. The Erlang version of QuickCheck was inspired by the Haskell version of QuickCheck, and it’s a very very useful tool. The adaptations for parallelism look very nice. It’s a shame that the Erlang version is commercial software. I don’t grudge the authors the right to charge money for their software, but I do think that this will hold back adoption of this important tool.

There were many talks on what I would describe as “cloud problems”. For example, Ezra Zygmutowicz’s “You got your Erlang in my Ruby” was really about how he built a self assembling cluster of Ruby daemon’s (Nanite), Dave Fayram and Abhay Kumar’s “Building Reliable Distributed Heterogenous Services with Katamari/Fuzed“, and Lennart Ohman’s “A service fail over and take-over system for Erlang/OTP”. Like PyCon, there was a lot of interested in eventually consistent databases/key-value stores/non-relational databases. Cliff Moon’s talk on dynomite (a clone of Amazon’s Dynamo system), was particularly encouraging because he was reaching out to other people in the audience (and there were a decent number of them) to try an consolidate all their efforts into a single project. From what I could tell, people seemed receptive to that idea.

CouchDB also fits into that last category of non-relational databases, but it gets it’s own paragraph. One reason is that I helped mentor the project through the Apache Incubator (and chauffeured those CouchDB committers who were present). Another is that CouchDB creator Damien Katz got a keynote. Third is that there was basically a CouchDB track on the second day of the conference. There was a lot of interest in CouchDB, and a lot of activity as well. I was told that some of the people who took the CouchDB training during the training days had actually submitted patches on the project already. Damien’s talk was not about the technical details of CouchDB, but about his personal journey to CouchDB, which included selling his house and living off his savings in order to see CouchDB come to life.

Activity has really picked up in the Erlang web framework space. In addition to Erlang Web, and Yariv Sadan’s Erlyweb, there is also Rusty Klophaus’ Nitrogen. Nitrogen focuses more on the UI side of the web framework, omitting any kind of data storage. It’s very easy to create an AJAX based user interface using Nitrogen, and there is nice support for Comet. As part of his presentation, Rusty showed his slides on a Nitrogen based webcast reflector. You specify the UI using Erlang terms, which then causes HTML/Javascript/etc to be generated, which caused a stir in part of the Twitter peanut gallery. I was mostly happy to see people focusing on solving the current generation of problems. My favorite web space talk was probably Justin Sheehy’s talk on Webmachine. I think that I prefer the description of WebMachine as a REST or HTTP toolkit. Webmachine gives you what you need to implement any HTTP method correctly, and then provided a set of callback functions that can be implemented to customize that processing to do actual work. One of the coolest things about Webmachine is it’s ability to visually show you that path taken in processing a particular HTTP request, and being able to inspect/dump data at various points in the diagram. It makes for a very nice demo.

There were not that many “language geek” talks. This contrasts with the early years of PyCon (at least for as long as I have attended) where there were quite a number. I missed Robert Virding’s talk on Lisp Flavored Erlang (but I saw some example usage in a CouchDB talk), because it overlapped the dynomite talk. I was able to attend Tony Arcieri’s talk on “Building Languages on Erlang (and an introduction to Reia)”. During the first part of his talk, Tony showed how to construct an Erlang module on the fly in the Erlang shell. He then discussed some tools which are useful to people trying to build languages on top of BEAM, the Erlang virtual machine:

  • Robert Virding has written leex, a lexical analyzer generator
  • yecc, a Yacc style parser generator is included in the Erlang distribution
  • the erl_syntax_lib library aids in constructing Erlang abstract syntax trees, which can then be compiled to Erlang bytecode.
  • Erlyweb contains the smerl (simple metaprogramming) library for creating and manipulating Erlang modules at runtime.

After that, he launched into a description of REiA. I’m not sure that I agree with some of the choices that he has made, but I am happy to see people experimenting with languages on top of BEAM, and in keeping with Erlang’s process model and the OTP infrastructure. One of the things that Tony mentioned was abandoning indentation based syntax. He wrote an entire postmortem on that experience in his blog. Python’s indentation based syntax has won me over and made me a fan, and I am sad to see that indentation syntax, blocks/closures, and expression orientation continue to be at odds.

Coda

It looks like Erlang is starting to find a home. Companies are using it in production. There are books starting to be written about it. Many (not all) of the things which make Erlang seem odd to “mainstream” programmers also appear in languages like Scala, Haskell, and F#. At the same time, Erlang has a long history of industrial deployment, albeit in a single (large) market segment. Many of the problems which we now face in large web systems (and the cloud): concurrency, distribution, high availability, and scalability are strengths for Erlang. Indeed, many of the people that I heard from or talked to basically said that they couldn’t solve their problem with any other technology, or that their solutions were dramatically simpler than the technologies that they already knew. Will that be enough to propel Erlang into the mainstream? I don’t know. I also don’t know if our current state of mainstreamness is going to remain. More and more I’m seeing an attitude of “let’s use the best tool for the job”, not only in languages, but in all parts of (web) applications.

There’s also the issue of the Erlang community itself.   Around 120 people showed up for the conference. As I mentioned previously, there are the folks who have been doing Erlang for years. Then there are the relative newcomers, who are web oriented/web savvy, and solving problems in very different domains than the original problem domain of Erlang and it’s inventors. Thus far, the two segments seem to be getting along fine. I hope that will continue — success or the potential for success has a tendency to bend relationships.