Archive for the 'programming' Category

JSConf 2011

Last year when I attended JSConf I had some ideas about the importance of Javascript. I was concerned in a generic way about building “richer” applications in the browser and Javascript’s role in building those applications. Additionally, I was interested in the possibility of using Javascript on the server, and was starting to learn about Node.js.

A year later, I have some more refined ideas. The fragmentation of mobile platforms means that open web technologies are the only way to deliver applications across the spectrum of telephones, tables, televisions and what have you, without incurring the pain of multi platform development. The types of applications that are most interesting to me are highly interactive with low latency user interfaces – note that I am intentionally avoiding the use of the word “native”. Demand for these applications is going to raise the bar on the skill sets of web developers. I think that we will see more applications where the bulk of the interface and logic are in the browser, and where the server becomes a REST API endpoint. The architecture of “New Twitter” is in this vein. API endpoints have far less of a need for HTML templating and server side MVC frameworks. But those low latency applications are going mean that servers are doing more asynchronous delivery of data, whether that is via existing Comet like techniques or via Websockets (once it finally stabilizes). Backend systems are going to partition into parts that do asynchronous delivery of data, and other parts which run highly computationally intensive jobs.

I’ll save the discussion of the server parts for my Nodeconf writeup, but now I’m ready to report on JSConf.

Talks

Here are some of the talks that I found interesting or entertaining.

Former OSAF colleague Adam Christian talked about Jellyfish, which is a tool for executing Javascript in a variety of environments from Node to desktop browsers to mobile browsers. One great application for Jellyfish is testing, and Jellyfish sprang out of the work that Adam and others did on Windmill.

It’s been a while since I looked at Bespin/Skywriter/Ace, and I was pleased to see that it seems to be progressing quite nicely. I particularly liked the Github support.

I enjoyed Mary Rose Cook’s account of how writing a 2D platform game using Javascript cause her to have a falling in love like experience with programming. It’s nice to be reminded of the sheer fun and art of making something using code.

Unfortunately I missed Andrew Dupont’s talk on extending built-ins. The talk was widely acclaimed on Twitter, and fortunately the slides are available. More on this (perhaps) once I get some time to read the slide deck.

Mark Headd showed some cool telephony apps built using Node.js including simple control of a web browser via cell phone voice commands or text messages. The code that he used is available, and uses Asterisk, Tropos, Couchbase, and a few other pieces of technology.

Dethe Elze showed of Waterbear, which is a Scratch-like environment running in the browser. It’s not solely targeted at Javascript, which I have mixed feelings about. My girls have done a bunch of Scratch programming, so I am glad to see that environment coming to languages that are more widely used.

The big topics

There were four talks in the areas that am really concerned about, and I missed one of them, which was Rebecca Murphey’s talk on Modern Javascript, which appeared to be derived from some blog posts that she has written on the topic. I think that the problems she is pointing out – ability to modularize, dependency management, and intentional interoperability are going to be major impediments to building large applications in the browser, never mind on the server.

Dave Herman from Mozilla did a presentation on a module system for the next version of Javascript (which people refer to as JS.next). The design looks reasonable to me, and you can actually play with it in Narcissus, Mozilla’s meta circular Javascript interpreter, which is a testbed for JS.next ideas. One thing that’s possible with the design is to run different module environments in the same page, which Dave demonstrated by running Javascript, Coffeescript, and Scheme syntaxed code in different parts of a page.

The last two talks of the conference were also focused on the topic of JS.next.

Jeremy Askenas was scheduled to talk about Coffeescript, but he asked Brendan Eich to join him and talk about some of the new features that have been approved or proposed for JS.next. Many of these ideas look similar to ideas that are in Coffeescript. Jeremy then went on to try and explain what he’s trying to do in Coffeescript, and encouraged people to experiment with their own language extensions. He and Brendan are calling programs like the Coffeescript compiler, “transpilers” – compilers which compile into Javascript. I’ve written some Coffeescript code just to get a feel for it, and parts of the experience reminded me of the days when C++ programs went through CFront, which then translated them into C which was then compiled. I didn’t care for that experience then, and I didn’t care for it this time, although the fact that most of what Coffeescript does is pure syntax means that the generated code is easy to associate back to the original Coffeescript. There appears to be considerable angst around Coffeescript, at least in the Javascript community. Summarizing that angst and my own experience with Coffeescript is enough for a separate post. Instead I’ll just say that I like many of the language ideas in Coffeescript, but I’d prefer not to see Coffeescript code in libraries used by the general Javascript community. If individuals or organizations choose to adopt Coffeescript, that’s fine by me, but having Coffeescript go into the wild in library code means that pressure will build to adapt Javascript libraries to be Coffeescript friendly, which will be detrimental to efforts to move to JS.next.

The last talk was given by Alex Russell, and included a triple head fake where Alex was ostensibly to talk about feature detection, although only after a too long comedic delay involving Dojo project lead Pete Higgins. A few minutes into the content on feature detection, Alex “threw up his hands”, and pulled out the real topic of his talk, which is the work that he’s been doing on Traceur, which is Google’s transpiler for experimenting with JS.next features. Alex then left the stage and a member of the Traceur team gave the rest of the talk. I am all in favor of cleverness to make a talk interesting, but I would have to say that the triple head fake didn’t add anything to the presentation. Instead, it dissipated the energy from the Brendan / Jeremy talk, and used up time that could have been used to better motivate the technical details that were shown. The Traceur talk ended up being less energetic and less focused than the talk before it, which is a shame because the content was important. While improving the syntax of JS.next is important, it’s even more important to fix the problems that prevent large scale code reuse and interoperability. The examples being given in the Traceur talk were those kinds of examples, but they were buried by a lack of energy, and the display of the inner workings of the transpiler.

I am glad to see that the people working on JS.next are trying to implement their ideas to the point where they could be used in large Javascript programs. I would much rather that the ECMAScript committee had actual implementation reports to base their decisions on, rather than designing features on paper in a committee (update: I am not meaning to imply that TC39 is designing by committee — see the comment thread for more on that. ). It is going to be several more years before any of these features get standardized, so in the meantime we’ll be working with the Javascript that we have, or in some lucky cases, with the recently approved ECMAScript 5.

Final Thoughts

If your interests are different than mine, here is a list of pointers to all the slides (I hope someone will help these links make it onto the Lanyrd coverage page for JSConf 2011.

JSConf is very well organized, there are lots of social events, and there are lots of nice touches. I did feel that last year’s program was stronger than this years. There are lots of reasons for why this might be the case, including what happened in Javascript in 2010/11, who was able to submit a talk, a change in my focus and interests. Chris Williams has a very well reasoned description of how he selects speakers for JSConf. In general I really agree with what he’s trying to do. One thing that might help is to keep all the sessions to 30 minutes, which would allow more speakers, and also reduce the loss if a talk doesn’t live up to expectations.

On the whole, I definitely got a lot out the conference, and as far as I can tell if you want to know what is happening or about to happen in the Javascript world, JSConf is the place to be.

Strange Loop 2010

Last week I was in Saint Louis for Strange Loop 2010. This was the second year of Strange Loop, which is a by hackers for hackers conference. I’m used to this sort of conference when it’s organized by a single open source community – I’d put ApacheCon, PyCon, and CouchCamp in this category. At the same time, Strange Loop’s content was very diverse, and had some very high quality speakers. It’s sort of like a cross between ApacheCon and OSCON. One difference is that there isn’t a community that’s putting on Strange Loop, and the fun community feel of ApacheCon or PyCon is missing.   

One of the reasons that I was interested in attending Strange Loop was Hilary Mason’s talk on data science / machine learning. This is an area that I am starting to delve into, and I did study a little machine learning right around the time that it was starting to shift away from traditional AI and more towards the statistical approach that characterizes it now. Hilary is the chief scientist at bit.ly, and as it turns out, a Brown alumnae as well. Her talk was a good introduction to the current state of machine learning for people who didn’t have any background. She talked about some of the kinds of questions that they’ve been able to answer at bit.ly using machine learning techniques. Justin Bozonier used Twitter to ask Hilary if she would be wiling to sit down with interested people and do some data hacking, so I skipped the session block (which was painful because I missed Nathan Marz’s session on Cascalog, which was getting rave reviews). We ended up doing some simple stuff around the tweets about #strangeloop. Justin has a good summary of what happened, complete with code, and Hilary posted the resulting visualization on her blog. It was definitely useful to sit and work together as a group and get snippets of insight into how Hilary was approaching the problem.

Another area that I am looking at is changes in web application architecture due to the changing role of Javascript on both the client and the server. I went to Kyle Simpson’s talk on Strange UI architecture, as well as Ryan Dahl’s talk on node.js. Kyle has built BikechainJS, another wrapper around V8, like Node.js. There’s a lot of interest around server side javascript – the next step is to think about how to repartition the responsibilities of web applications in a world where clients are much more capable, and where some code could run on either the client or the server.

Guy Steele gave a great talk, and the number of people who can give such talk is decreasing by the day. As a prelude to talking about abstractions for parallel programming, Guy walked us through an IBM 1130 program that was written on a single punch card. He had to reverse engineer the source code from the card, which was complicated by the fact that he used self modifying code as well as some clever value punning in order to get the most out of the machine. The thrust of his comments on parallel programming was that the accumulator style of programming which pervades imperative programs is bad when it comes to exploiting parallelism. Instead, he emphasized being able to find algebraic properties such as associativity or commutativity which would allow parallelism to be exploited via the map/reduce style of programming pioneered decades ago in the functional programming community, and popularized by systems like Hadoop. Guy was proposing that mapreduce be the paradigm for regular programming problems, not just “big data” problems. For me, the most interesting comment that Guy made was about Haskell. He said that if he know what he knew now when he had started on Fortress, he would have started with Haskell and pushed it 1/10 of the way to FORTRAN instead of starting with FORTRAN and pushing it 9/10 of the way to Haskell.

I’m not generally a fan of panel sessions, because the vast majority of them don’t really live up to their promise. Ted Neward did a really good job of moderating the panel on “Future of Programming Languages”. At the end of the panel, Ted asked the panelists which languages they though people should be learning in order to get new ideas. The list included Io (Bruce Tate), Rebol (Douglas Crockford), Forth and Factor (Alex Payne), Scheme and Assembler (Josh Bloch), and Clojure (Guy Steele). Guy’s comments on Clojure rippled across Twitter, mutating in the process, and causing some griping amongst Scala adherents. The panel appears to have done it’s job in encouraging controversy.

Also in the Clojure vein, I attended Brian Marick’s talk “Outside in TDD in Clojure“. Marick has written midje, a testing framework that is more amenable to the a bottom up style of programming that is facilitated by REPL’s. It’s an interesting approach relying on functions that provide a simple way to specify placeholders for functions that haven’t been completed yet. This also serves as a leverage point for the Emacs support that he has developed.

Doug Crockford delivered the closing keynote. I’ve heard him speak before, mostly on Javascript. His talk wasn’t about Javascript at all, but it was very engaging and entertaining. If you have the chance to see him speak in that kind of setting, you should definitely do it.

A few words on logistics. The conference was spread out across three locations. I feared the worst when I heard this, but it turned out to be fine – OSCON in San Jose was much more inconvenient. The bigger logistical issue was WiFi. None of the three venues was really prepared for the internet requirements of the Strange Loop attendees. WiFi problems are not a surprise at a conference, but the higher quality conferences do distinguish themselves on their WiFi support.

All in all, I think that Strange Loop was definitely worthwhile. The computing world is becoming “multicultural”, and it’s good to see a conference that recognizes that fact.

Haskell Workshop and CUFP 2010

It has been many years since I attended an ACM conference, and even more years since I attended the Lisp and Functional Programming Conference, which has evolved into the International Conference on Functional Programming (ICFP). ICFP was in the United States this year, and I’ve wanted to drop in for quite some time. There are many ideas pioneered by the functional programming community, and as much as possible I like to go to the original sources of ideas. ICFP is a long conference with many attached events, and it turns out that the best use of my time was to drop in for the Haskell workshop at the tail end of the conference, and the Commercial Users of Functional Programming (CUFP) conference.

Haskell Workshop

I’ve been around long enough to remember when Haskell first came out, and despite my stint as a database programming languages grad student, I’ve never had the chance to really give Haskell the attention that I feel it deserves. 20 year since its appearance Haskell is still barely on the radar. At the same time, I heard some very interesting talks at the workshop. Things like the Hoopl library for implementing dataflow optimzations in compilers, and the Orc DSL for concurrent scripting. The Haskell systems hackers have made great progress and doing some great work. Bryan O’ Sullivan described his work on improving GHC’s ability to handle lots of long lived open network connections. Given the recent burst of interest in event based programming models, such as Node.js, this is an interesting result. Simon Marlow presented a redesign of the Evaluation Strategies mechanism that GHC uses to control parallelism. Many of the talks that I heard have ideas that are applicable to problems that exist in modern systems. I just wish that I could see a path the involved using Haskell itself to solve those problems instead of the ideas migrating into another language/system.

Surgecon

Unbeknownst to me, my friend Theo Schlossnagle ran Surge, a conference on scalability, in Baltimore, and it overlapped the parts of ICFP that I attended. Surge seems to have flown pretty low under the radar. Google doesn’t return many relevant results for it, and the best information (other than talking to Surge attendees) I’ve been able to find on Surge is on Lanyrd. Theo told me that he was counting on this year’s attendees to be his PR for next year. I didn’t attend, but based on the tweets and dinner conversations, it sounds like it was great. I had dinner/beers with some Apache folks who were in town for Surge, as well as some Surge attendees like Bryan Cantrill. The “systems guys” gave me a good ribbing about being at a conference for “irrelevant languages”, and I had a really good conversation with Bryan about Node.js, cloud computing, and the Oracle acquisition (ok, that part wasn’t so good). Node.js is on a lot of people’s minds at the moment, and it was good to hear Bryan’s perspective on it. It was an interesting sidebar to the immersion in functional programming. I do think that in the medium term there are some interesting connections between Node and FP, but that ‘s probably an entire post of its own.

CUFP

There was a lot of F# related content at CUFP, and I think that Microsoft deserves kudos for the work that they are doing. I think it’s pretty clear that shipping F# in the box with Visual Studio 2010 is not a huge money maker for Microsoft at this point, and I’m impressed with their willingness to take a long term view of the future of programming. Unfortunately I’m not a Windows ecosystem person, so as attractive as F# and Visual Studio are, I doubt that I’ll be playing with this anytime soon.

Marius Eriksen‘s talk on Scala at Twitter was interesting because of the way that he described the conceptualization of Rockdove operations as folds, taking clear advantage of the benefits offered by a functional style. He also had some thought provoking comments about giving applications access to the behavior of the garbage collector. There are some interesting possibilities if you start to give developers control of the behavior of various parts of the runtime system.

Michael Fogus talked about his company’s experience using Scala. His talk was pretty entertaining, and there were some interesting comparisons between Scala features that they thought would be useful and Scala features that actually turned out to be useful. My only issue with his talk was the size of the sample, which isn’t something that he could do anything about. This was also true of the talk by the Intel compiler folks.

I’ve seen a number of talks on the Microsoft Reactive Extensions, mostly with respect to JavaScript. I continue to believe that RxJS could be a great help to Javascript programmers, particularly as things like Node.js take hold. Matt Podwysocki’s Node.js file server example shows how.

Warren Harris from Metaweb talked about his use of monads, arrows, and OCaml to build a more efficient query processor for Freebase’s MQL query language. This was a really interesting talk, because query optimization was the topic of my graduate school research, and at the time the connections between query languages and functional programming were a relatively new topic.

Final thoughts

It doesn’t take much to fan the flames of functional love in me. There are lots of smart people working on beautiful and interesting solutions. I wish that I could see a better path for those ideas to make it into mainstream practice.

It’s all about the workflow

In the last few months I’ve been running into the same issue over and over again. At OSCON I was out to dinner with some Apache/Subversion friends. In recent years, conversations with these friends turn to the subject of Subversion versus one of the distributed version control systems, usually, but not always git. And as often happens, the conversation was focused on particular features of the systems. Distribution, obscurity of command sets, the workings of various individual features. For me, the important thing about the DVCS’s is not the various features, it is about supporting a particular kind of development model, a workflow of using the tools.   Vincent Driessen’s excellent post on the git branching model, outlines the kind of scenario that I want to be able to support. That workflow is important to me, not the particulars of git. I’d be happy if more than one tool could provide good support for such a workflow. To my relief, that’s what at least some of the Subversion committers want to be able to do, and I’m looking forward to seeing their work. On the git side of the house, Vincent has written git flow, some extensions to git that make the workflow easier to manage when using git. Github’s recently enhanced pull request mechanism is another example of great git related workflow management.

Software that focuses on workflows is much more valuable to me (assuming it supports a workflow that I use – not a foregone conclusion). Each piece of software that I use on a regular basis has been selected because it supports a workflow that works for me, or because I can mould it into supporting a workflow that is comfortable for me. So today, that means NetNewsWire for Mac’s combined view and OmniFocus on Mac/iPad/iPhone for review mode on the desktop and iPad, and forecast view on the iPad. I also have use Python to make some Macintosh desktop apps provide a workflow that ‘s more suitable for me: For mail, that means Mail.app plus Mail-Act-On plus Python scripts plus Keyboard Maestro. For meeting notes that means Evernote plus Entourage plus Python scripts on the Desktop and iPad

One domain where I still haven’t found a great fit yet is the activity/life stream space. Right now I’m using Echofon on the Mac, Twitter for iPad, and Twitter for iPhone. I also have Flipboard on the iPad. Each of them works relatively well, but none of them really solve the problems that I have as a high volume Twitter reader. I really haven’t seen anything that works in a way that will really help me deal with the firehose of information from various online sources of various kinds. Here lies an opportunity.

App developers of all kinds, giving me neat features is good. Streamlining my workflow is better.

CouchCamp 2010

I spent a few days last week at CouchCamp, the first mass in-person gathering of the community around CouchDB. There were around 80 people from all over the world, which is pretty good turnout. The conference was largely in unconference format although there were some invited speakers, including myself.

I think it says a lot about the CouchDB community that they invited both Josh Berkus and Selena Deckelmann from Postgres to be speakers. The “NoSQL” space has become quite combative recently, so it is great to see that the CouchDB has connections to the Postgres community, and respect for the history and lessons that the Postgres folks have learned over the year. Josh’s talk on not reinventing the wheel was well received, and his discussion of Joins vs Mapreduce took me back to my days as a graduate student in databases. His talk made a great lead in for Selena’s talk on the nitty gritty details of MultiVersion Concurrency Control

There were lots of good discussions on issues related to security and CouchApps, but the discussion that got my attention the most was Max Ogden’s discussion on the work that he is doing to open up access to government data, particularly around the use of location information. He’s been using GeoCouch as the platform for this work. In the past I’ve written about the importance of a good platform for location apps, particularly in the context of GeoDjango. GeoCouch looks to be a very nice platform for location based applications. This is a very nice plus for the CouchDB community.

These days, it’s impossible to be at a conference that involves Javascript and not hear some buzz about Node.js. As expected, there was quite a bit of it, but it was interesting to talk to people about what they are doing with Node. Everything that I heard reinforces my gut feel that Node.js is going to be important.

I was one of the mentors for the CouchDB project when it came to the Apache Software Foundation, and I was asked to speak about community. The CouchDB community has accomplished a lot in the last few years, and is doing really well. I prepared a slide deck, but didn’t project it because my talk was the last talk of the conference, and we wanted to do it in the outside amphitheater. I also wanted to tune some of the sections of the talk to include things that I observed or was asked about during the conference. The biggest reason that I prepared slides was to show excerpts of Noah Slater’s CouchDB 1.0 retrospective e-mail. A lot of what I think about community is summarized well in Noah’s message, and the note summarizes the state of the community better than I could have done it myself. I hope that we’ll be hearing more testimonials like Noah’s in the years to come.

Concurrency => Parallelism

I wanted to clarify a point from my post The Cambrian Period of Concurrency.

I made the statement

From where I sit, this is all about exploiting multicore hardware

because I’ve seen a pile of actor and other concurrency libraries which have not taken parallel execution of the concurrent program seriously. If I am going to go to the trouble of writing a concurrent program, then I want that execution to be parallel, especially in a multicore world.

Simon Marlow from the GHC team said that if programming multicore machines is the only goal we ought to be looking at parallelism first and concurrency only as a last resort. Haskell has some nice features for taking advantage of parallelism. However, I explicitly stated that I was not as interested in highly regular or data parallel computations, which is what Haskell’s parallelism tools are aimed at. These are fine ways to get parallelism, but I am interested in problems which are genuinely concurrent, not just parallel. In a Van Roy hierarchy, these are the problems with observable nondeterminism. I also specifically called out reduction of latency as one of my goals, something which Marlow says is a possible benefit of concurrency. The GHC team is interested in a different mix of problems than I am.

Van Roy in short

I also forgot to mention Peter Van Roy’s paper Programming Paradigms for Dummies: What Every Programmer Should Know, which includes an overview of his stratification of concurrency and parallelism (and other stuff). If you don’t have time to read his book, the paper is shorter and more digestible.

The Cambrian Period of Concurrency

Back in July, I gave an OSCON talk that was a survey of language constructs for concurrency. That talk has been making the rounds lately. Jacob Kaplan-Moss made referred to it in a major section of his excellent keynote Snakes on the Web, and Tim Bray has cited it as a reference in his Concur.next series. It seems like a good time for me to explain some of the talk content in writing and add my perspective on the current conversations.

The Cambrian

The Cambrian period was marked by a rapid diversification of lifeforms. I think that we are in a similar situation with concurrency today. Although many of the ideas that are being tossed around for concurrency have been around for some time, I don’t think that we really have a broad body of experience with any of them. So I’m less optimistic than Tim and Bruce Tate, at least on time frame. I think that we have a lot of interesting languages, embodying a number of interesting ideas for working with concurrency. I think that some of those languages have gained enough interest/adoption that we are now in a position to get a credible amount of experience so that we can start evaluating these ideas on their merits. But I think that the window for doing that is pretty large, on the order of 5 to 10 years.   

What kinds of problems

The kinds of problems I am interested in are general purpose programming problems. I’m specifically not interested in scientific, numeric, highly regular kinds of computations or data parallel computations. Unlike Tim, I do think that web systems are a valid problem domain. I see this being driven by the need to drive down latency to provide good user response time, not to provide additional scalability (although it probably will).

It’s not like Java

Erik Engbrecht, one of Tim’s commenters said:

To get Java, you basically take Smalltalk and remove all of the powerful concepts from it while leaving in the benign ones that everyday developers use.

I think there’s something to be learned from that.

This presupposes that you know what all the good concepts are and what the benign ones are. It doesn’t seem like we are at that point. When Java was created, both Lisp and Smalltalk had existed for quite sometime and it was possible to do this kind of surgery. I don’t have a clear sense of what actually works well, much less what is powerful or benign.

The hardware made me do it

From where I sit, this is all about exploiting multicore hardware, and when I say this I mean machines with more than 4 or 8 hardware threads (I say threads, not cores – actual parallelism is what is important). The Sun T5440 is a 256 thread box. Intel’s Nehalem EX will let you build a 128 thread box later this year. Those are multicore boxes. If you look at experiments, you see that systems that seem to work well at 2 or 4 threads don’t’ work well at 16 or 64 threads. Since there’s not a huge amount of that kind of hardware around yet, it’s hard for people to run experiments at larger sizes. Experiments being run on 2 thread MacBook Pro’s are probably not good indicators of what happens at even 8 threads.. This is partially because dealing with more hardware threads requires more administrative overhead, and as the functional programming people found out, that overhead is very non-trivial. The point is, you have to run on actual hardware to have believable numbers.   This makes it hard for me to take certain kinds of systems seriously, like concurrency solutions running on language implementations with Global Interpreter Locks. See David Beazley’s presentation on Python’s Global Interpreter Lock, for an example.

Comments on specific languages

At this point I am more interested in paradigms and constructs as opposed to particular languages. However, the only way to get real data on those is for them to be realized in language designs and implementations.

  • Haskell – Functional Laziness aside, the big concurrency thing in Haskell is Software Transactional Memory (STM). There are other features in Haskell, but STM is the big one. STM is an active research field in computer science, and I’ve read a decent number of papers trying to make heads from tails. Among the stack that I have read, it seems to be running about even between the papers touting the benefits of STM and the the papers saying that STM cannot scale and will not work in practice. The jury is very much out on this one, at least in my mind.
  • Erlang – I like Erlang. It’s been in production use for a long time, and real systems have been built using it. In addition to writing some small programs and reviewing some papers by Erlang’s designers, I spent a few days at the Erlang Factory earlier this year trying to get a better sense of what was really happening in the Erlang community. While there’s lots of cool stuff happening in Erlang, I observed two things. First, the biggest Erlang systems I heard described (outside of Facebook’s) are pretty small compared to a big system today. Second, and more importantly, SMP support in Erlang is still relatively new. Ulf Wiger’s DAMP09 presentation has a lot of useful information in it. On the other hand, BEAM, the Erlang VM is architected specifically for the Erlang process/actor model. This feels important to me, but we need some experimental evidence.
  • Clojure – Clojure as a ton of interesting ideas in it. Rich Hickey has really done his homework, and I have a lot of respect for the work that he is doing. Still it’s the early days for Clojure, and I want to see more data. I know Rich has run some stuff on one of those multiple hundred core Azul boxes, but as far as I know, there’s not a lot of other data.
  • Scala – The big thing in Scala for concurrency is Actors, but if you compare to Erlang, Actors are the equivalent of Erlang processes. A lot of the leverage that you get in Erlang comes from OTP, and to get that in Scala, you need to look at Jonas Boner’s highly interesting Akka Actor Kernel project. Akka also includes an implementation of dataflow variables, so Akka would give you a system with Actors, supervision, STM, and Dataflow (when it’s done).   
  • libdispatch/Grand Central Dispatch – Several of Tim’s commenters brought up Apple’s Grand Central Dispatch, now open sourced as libdispatch. This is a key technology for taking advantage of multicore in Snow Leopard. GCD relies on programmers to create dispatch queues which are then managed by the operating system. Programmers can send computations to these queues via blocks (closures), which are a new extension to Objective-C. When I look at Apple’s guide to migrating to GCD from threads, I do see a model that I prefer to threads, but it is not as high level as some of the others. Also, the design seems oriented towards very loosely coupled computations.   It will be several years before we can really know how well GCD is working. I am typing this post on a 16 thread Nehalem Mac Pro, and I rarely see even half of the CPU meters really light up, even when I am running multiple compute intensive tasks. Clearly more software needs to take advantage of this technology before we have verdict on its effectiveness in production.
  • .Net stuff like F#/Axum, etc – There is some concurrency work happening over on the CLR, most notably in F# and Axum. I spent some time at Lang.NET earlier this year, and got a chance to learn a bit about these two technologies. If you look at paradigms, the concurrency stuff looks very much like Erlang or Scala, with the notable exception of join patterns, which are on Martin Odersky’s list for Scala. I will admit to not being very up to speed on these, mostly for lack of Windows and the appropriate tools.

Other thoughts

Jacob’s take away from my talk at OSCON was “we’re screwed”. That’s not what I wanted to convey. I don’t see a clear winner at the moment, and we have a lot of careful experimentation and measuring to do. We are quite firmly in the Cambrian, and I’m not in a hurry to get out – these things need to bake a bit longer, as well as having some more experimentation.

In addition to my talk, and Tim’s wiki page, if you are really interested in this space, I think that you should read Concepts, Techniques, and Models of Computer Programming by Peter van Roy and Seif Haridi. No book can be up to date with the absolute latest developments, but this book has the best treatment that I’ve seen in terms of trying to stratify the expressiveness of sequential and concurrent programming models.

Erlang Factory 2009

I spent Thursday and Friday of last week at the Erlang Factory in San Francisco (although the event was actually in Palo Alto).

Why did I go?

I’ve written about Erlang in this space before. Erlang is having a major influence on other languages, such as Scala on the JVM side and Axum on the CLR side. In addition every language seems to have several implementation of Erlang style “actors” (despite the fact that this is historically incorrect). Erlang has been around for a long time, and has seen industrial usage in demanding telecom applications. As a dynamically typed functional language with good support for concurrency and distribution, it is (if nothing else) a source of interesting ideas. Earlier this year, my boss asked me to start doing some thinking about cloud computing in addition to the stuff that I was already doing around dynamic languages — another good match for Erlang. This was the first large scale gathering of Erlang people in the US (at least that I am aware of), so I wanted to drop in and see what was going on, what the community is like, and so on.

Talks

The program at the Erlang Factory was very strong. In many of the slot sessions, there were 3 excellent talks to choose from. Every single talk that I went to was of very high quality. It was so bad that I wasn’t able to explore all the areas that I wanted to. Fortunately, the sessions were videotaped and are supposed to be made available on the web. Also, there was a decent amount of twittering going on, so a Twitter search for #erlangfactory will turn up some useful information.

I attended a number of “experience” talks by companies / individuals. There were experience talks from Facebook, SAP, Orbitz, and Kreditor (the fastest growing company in Sweden). I made it to the Facebook talk and the Kreditor talk. Facebook’s usage/deployment is on the order of 100 machines, which provide the chat facility for Facebook. Erlang is doing all the heavy lifting, and PHP is doing the web UI part. There was a lot of this kind of architecture floating around the conference. It seemed like the most popular combination was Ruby/Erlang, but there was definitely Python and PHP as well. The Kreditor talk was interesting because their site has been running for 3 years with very small amounts of downtime. Unfortunately, their entire deployment is probably less than 10 machines, so that blunts the impressiveness of what they have done. Still it was interesting to hear how they accomplished this using features of Erlang. In addition to the talks, I spoke with many attendees who are using Erlang in their companies. One such person was eBay founder Pierre Omidyar, who is running Ginx, a web based Twitter client. Pierre is doing the coding and deployment of the site, and was well versed in the Erlang way of doing things. An interesting data point.

The Erlang community (like all communities) has it’s old guard. These are folks who have worked with Erlang for years, before its recent burst of interest. There were a pair of keynotes by Erlang long-timers Robert Virding (The Erlang Rationale) and Ulf Wiger (Mulitcore Programming in Erlang). Both of these talks shared a common trait — the speakers were pretty honest about what was good about Erlang, and where there were problems. Given how prone the computing business is to fashion, I found this to be refreshing. Virding talked about the reasons why Erlang is designed the way it is. He accepted the blame for inconstencies in the libraries, talked about the need to avoid the process dictionary, and agreed that “a char type is probably not wrong”. Wiger’s talk was about why parallelizing code is hard (even with Erlang). He used the example of parallelizing map to demonstrate this, and showed the use of the QuickCheck testing tool to aid in finding parallelism bugs. The Erlang version of QuickCheck was inspired by the Haskell version of QuickCheck, and it’s a very very useful tool. The adaptations for parallelism look very nice. It’s a shame that the Erlang version is commercial software. I don’t grudge the authors the right to charge money for their software, but I do think that this will hold back adoption of this important tool.

There were many talks on what I would describe as “cloud problems”. For example, Ezra Zygmutowicz’s “You got your Erlang in my Ruby” was really about how he built a self assembling cluster of Ruby daemon’s (Nanite), Dave Fayram and Abhay Kumar’s “Building Reliable Distributed Heterogenous Services with Katamari/Fuzed“, and Lennart Ohman’s “A service fail over and take-over system for Erlang/OTP”. Like PyCon, there was a lot of interested in eventually consistent databases/key-value stores/non-relational databases. Cliff Moon’s talk on dynomite (a clone of Amazon’s Dynamo system), was particularly encouraging because he was reaching out to other people in the audience (and there were a decent number of them) to try an consolidate all their efforts into a single project. From what I could tell, people seemed receptive to that idea.

CouchDB also fits into that last category of non-relational databases, but it gets it’s own paragraph. One reason is that I helped mentor the project through the Apache Incubator (and chauffeured those CouchDB committers who were present). Another is that CouchDB creator Damien Katz got a keynote. Third is that there was basically a CouchDB track on the second day of the conference. There was a lot of interest in CouchDB, and a lot of activity as well. I was told that some of the people who took the CouchDB training during the training days had actually submitted patches on the project already. Damien’s talk was not about the technical details of CouchDB, but about his personal journey to CouchDB, which included selling his house and living off his savings in order to see CouchDB come to life.

Activity has really picked up in the Erlang web framework space. In addition to Erlang Web, and Yariv Sadan’s Erlyweb, there is also Rusty Klophaus’ Nitrogen. Nitrogen focuses more on the UI side of the web framework, omitting any kind of data storage. It’s very easy to create an AJAX based user interface using Nitrogen, and there is nice support for Comet. As part of his presentation, Rusty showed his slides on a Nitrogen based webcast reflector. You specify the UI using Erlang terms, which then causes HTML/Javascript/etc to be generated, which caused a stir in part of the Twitter peanut gallery. I was mostly happy to see people focusing on solving the current generation of problems. My favorite web space talk was probably Justin Sheehy’s talk on Webmachine. I think that I prefer the description of WebMachine as a REST or HTTP toolkit. Webmachine gives you what you need to implement any HTTP method correctly, and then provided a set of callback functions that can be implemented to customize that processing to do actual work. One of the coolest things about Webmachine is it’s ability to visually show you that path taken in processing a particular HTTP request, and being able to inspect/dump data at various points in the diagram. It makes for a very nice demo.

There were not that many “language geek” talks. This contrasts with the early years of PyCon (at least for as long as I have attended) where there were quite a number. I missed Robert Virding’s talk on Lisp Flavored Erlang (but I saw some example usage in a CouchDB talk), because it overlapped the dynomite talk. I was able to attend Tony Arcieri’s talk on “Building Languages on Erlang (and an introduction to Reia)”. During the first part of his talk, Tony showed how to construct an Erlang module on the fly in the Erlang shell. He then discussed some tools which are useful to people trying to build languages on top of BEAM, the Erlang virtual machine:

  • Robert Virding has written leex, a lexical analyzer generator
  • yecc, a Yacc style parser generator is included in the Erlang distribution
  • the erl_syntax_lib library aids in constructing Erlang abstract syntax trees, which can then be compiled to Erlang bytecode.
  • Erlyweb contains the smerl (simple metaprogramming) library for creating and manipulating Erlang modules at runtime.

After that, he launched into a description of REiA. I’m not sure that I agree with some of the choices that he has made, but I am happy to see people experimenting with languages on top of BEAM, and in keeping with Erlang’s process model and the OTP infrastructure. One of the things that Tony mentioned was abandoning indentation based syntax. He wrote an entire postmortem on that experience in his blog. Python’s indentation based syntax has won me over and made me a fan, and I am sad to see that indentation syntax, blocks/closures, and expression orientation continue to be at odds.

Coda

It looks like Erlang is starting to find a home. Companies are using it in production. There are books starting to be written about it. Many (not all) of the things which make Erlang seem odd to “mainstream” programmers also appear in languages like Scala, Haskell, and F#. At the same time, Erlang has a long history of industrial deployment, albeit in a single (large) market segment. Many of the problems which we now face in large web systems (and the cloud): concurrency, distribution, high availability, and scalability are strengths for Erlang. Indeed, many of the people that I heard from or talked to basically said that they couldn’t solve their problem with any other technology, or that their solutions were dramatically simpler than the technologies that they already knew. Will that be enough to propel Erlang into the mainstream? I don’t know. I also don’t know if our current state of mainstreamness is going to remain. More and more I’m seeing an attitude of “let’s use the best tool for the job”, not only in languages, but in all parts of (web) applications.

There’s also the issue of the Erlang community itself.   Around 120 people showed up for the conference. As I mentioned previously, there are the folks who have been doing Erlang for years. Then there are the relative newcomers, who are web oriented/web savvy, and solving problems in very different domains than the original problem domain of Erlang and it’s inventors. Thus far, the two segments seem to be getting along fine. I hope that will continue — success or the potential for success has a tendency to bend relationships.

Why I finally believe in hashtags

I’ve been using Twitter for a while now, but I’ve never really used hashtags much. I’ve never been much for doing the stuff it takes to get a highly promoted blog or twitter stream. I figure that if my content is worthwhile, that should be enough. At PyCon I found the compelling hashtag use case for me.

There were a lot of people using hashtags in their PyCon tweets, and Jacob Kaplan-Moss showed me Twitterfall, which made it easy to keep track of uses of the tag. That made it *much* easier to find the virtual twitter stream for PyCon. This was also true at Lang.NET, the DSL DevCon, and the MySQL conference. This week(end) I’ll be using hashtags to track the progress of JSConf.   From now on I’ll always use hashtags when I’m at a conference or event.

One reason that it’s taken me so long to get the hash tag thing is that I use Twitter primarily via rich desktop (or iPhone) clients. Until recently I wasn’t using clients that could do searching. I had tried TweetDeck, and it never stayed with me. When Nambu came along, I was pretty enthusiastic because it was a native TweetDeck. Unfortunately, I had crashing problems with it at Lang.Net (since fixed, I think), and I put it aside when I realized that Syrinx 2.0 had searches. While Syrinx doesn’t save searches across restarts, its memory use is tolerable enough to leave it running all the time, so it’s not a big problem, and I am hopeful that MRR will include saved searches in a future version. Commenters: yes, I tried Tweetie for Mac, and I didn’t like it. I love Tweetie for iPhone, though. Go figure.

DSLDevCon 2009

I’ve been having trouble coming up with a good summary of the (Domain Specific Language) DSL DevCon. That’s partly because there was a lot of information to absorb between Lang.NET and the DevCon. Even more so, I’m finding it hard to distill what I saw, what I didn’t see, what I wanted to see, and what I think we need to see next. That’s odd because I’ve accepted the notion of DSL’s should be a part of the programmer’s toolbox ever since I sat through the “metalinguistic abstraction” section of Sussman and Abelson’s MIT class in 1984.

Reporting

I’m going to call out four talks that really stood out for me. There were more than just these four, but it was either these four or all of them, and all of them is too much work.

  • Guillaume LaForge’s talk on Groovy DSL’s was important because he not only showed how to build DSL’s using Groovy, but he’s actually working with real customers, like Mutual of Omaha, who are using those DSL’s in production.   

  • I was happy to hear Markus Voelter’s talk Textual DSL’s and Code Generation with Eclipse Tools because a lot of the noise that I’ve heard on the DSL front has been coming from the Ruby and .NET side of the world. One thing that got my attention at the DevCon was the importance of tooling, so it was good to see that there are some tooling efforts in the Java space. It’s too bad that no one from JetBrains was there to present on MPS.

  • Brad Cross and Ted Neward did a talk entitled “Functional vs. Dynamic DSLs: The Smackdown”. I came away from this talk wanting more, and not in a good way. Brad and Ted really needed about 2 hours in order to give all the relevant background a chance to settle in. During the talk they presented a set of things which differentiated the functional programming and dynamic language styles of creating “Internal” (I really dislike the Internal/External terms) DSLs. Unfortunately, there wasn’t enough time to really dig in and explore the meat of what they said. I think that a deep addressing of the points that they made would be a very important contribution to the DSL topic. Maybe we’ll get to see a series of blog posts, developerWorks articles, or even an academic paper of some kind.

  • I view Intentional Software as one of those grand computer science projects. Having worked on Chandler, I have an appreciation for the perils of large, grand efforts. This is the first time that I had a chance to see a presentation by anyone from Intentional Software, and it is just as well that it was a demo of their just shipped product. I took note when the Intentional Software project was started back in 2002, but I’ve not heard a lot about their progress since then. What we saw was a demonstration of a production version of their “Domain Workbench” which is a system for allowing domain experts and programmers to work together to build a system which domain experts can then use to write software. Instead of writing programs, the programmers write the generator which takes the domain language (which can be visual) and then generates code. The system represents the domain information in a way that allows multiple, editable, “projections” (views). The demonstrations that we saw included an actuarial workbench, complete with mathematical notation, and an electronics workbench, expressed as circuit diagrams. If you are interested, your best bet is to watch the video when the videos are posted.

    I am pretty impressed with what I saw, but there are lots of questions. How many domains can this actually work for? How hard is it to write generators? What’s the business model for domain workbenches? It seems pretty clear to me that for the domains and organizations where this can work, this approach is going to have a pretty sizable impact. Perhaps not this year, but within the next 5 years. I have to hand it to the Intentional Software guys. Their presentation was pretty low key, and they are going out of their way to not hype their stuff. They plan to work with a small number of customers to gradually prove out their approach. In an area which is highly susceptible to hype, it was refreshing to see people trying to keep expectations to a reasonable level.

The DevCon (and Lang.NET) were also my chance to meet two people who I’ve followed for sometime from afar: Ted Neward, and Larry O’Brien. Ted is well known and I’ve been following his blog for some time. He’s local to the Puget Sound area, and it’s probably just bad timing that we never met before this week. Larry O’Brien has been a commenter on my blog, as well as a responder on Twitter. I’ve appreciated his blog as well as the columns that he’s written over the years. It was great fun to run to the back of the room after each talk and see what the Twitter cabal (which included Larry) had to say about the material we had just seen.

Analysis

I think that DSL’s are inevitable. It’s remarkable to me how prescient Abelson and Sussman were when they defined three categories of abstraction: control abstraction, data abstraction, and metalinguistic abstraction. If you look at some of the recent frenzies in languages, you’ll see that we are mostly improving the ability of various languages to perform various kinds of abstraction. These concepts are not new, but they are appearing in languages which are approachable by today’s practitioners. Object oriented programming? Data abstraction. Closures? Control Abstraction. Pattern Matching/Algebraic datatypes? Data and control abstraction. DSLs and the capabilities needed to enable them? Metalinguistic abstraction.

Language as an abstraction is very powerful, and requires support from the underlying language as well as the tools. These two topics (as well as specific examples of domain specific languages) were the focus of the DevCon. The audience makeup appeared to be mostly language and compiler geeks. There were a few people (mostly consultants as far as I could tell) who write business applications, but this group was pretty small. This is important because most of the DSL’s presented were aimed at very computer science kinds of domains. If DSL’s are to have a broader impact, then it would be great to see more business people at events like this.

One thing which was not addressed at all was the process end of this. In order to build DSL’s for non computer domains, there has to be a collaboration between developers and domain experts. The Intentional Software guys recognize this via some “groupware” to facilitate this process. However, tooling alone is not enough to bridge this gap. I hope that we’ll be hearing reports on the process of collaboration between developers and domain experts as more and more people build DSL’s.

This is an interesting space, from a technical point of view. There is lots of cool language design and compiler stuff, some of my favorite topics.   On the business end, it seems like there are some decent sized opportunities here, and that tooling is going to play a very large role — language support for DSL’s will be important, but may be overshadowed by the importance of good tools.

Update: the videos are now avaiable