Tag Archives: programming

Best PyCon Evar

I probably should have chosen a different title for this post, because at the rate things are going for PyCon, I’ll just have to use the same title again for the next few years. This year, PyCon happened during the same week as ApacheCon EU (the 10th anniversary of the ASF), and EclipseCon. I have a slight bit of regret that I wasn’t at ApacheCon for the 10 year anniversary, but I’m planning to be at the 10th anniversary celebration at ApacheCon US in Oakland, in November. That roughly corresponds to the time when first got involved with Apache and open source, so it will be pretty meaningful. Beyond that, it was hands down for PyCon, my favorite conference. Even if the PyCon organizers hadn’t invited me to speak on a topic of my choosing, there are just so many things to love about PyCon.

The Talks

PyCon 2009

Despite a very active and fun hallway track, I did go to a number of talks.   

I went to Adam Christian and Mikeal Rogerstalk on Windmill mostly for moral support. We worked together at OSAF, and I like Windmill, and it’s really good to see Windmill picking up steam in the Python and other communities. If you are looking for a web testing framework, particularly one that is string at AJAX applications, you owe it to yourself to look at Windmill.

There were a few tools talks that I attended. I use IPython, so I was curious to see how Reinteract: a better way to interact with Python, would improve on IPython. I like the Visicalc/TkSolver like worksheet that allows you to change values in a Python interpreter history and have values propage forward. I’d love to see all these REPL tools come together in an integrated way. We might finally get back to the functionality of the Lisp Machine REPLs someday. I also attended How AlterWay releases web applications using “zc.buildout“ since Jacob Kaplan-Moss warned me that the zc.buildout documentation was sorely lacking. Even that talk wasn’t enough to get me going, but the sprints produced some great new documentation for buildout. I’m looking forward to digging into that.

Some talks dealt directly with topics that are relevant to work, particularly now that the dynamic languages folks at Sun are now a part of the Cloud Computing division. These talks included:

  • Twisted AMQP and Thrift: Bridging Messaging and RPC for building scalable distributed applications – Twisted bridges to AMQP and Thrift.

  • Concurrency and Distributed Computing with Python Today – Jesse Noller did a great job surveying the various offerings available in Python today. There’s a lot of stuff there, but I think that there’s still quite some way to go yet. That’s not picking on Python, that’s just my general view of this space.

  • Drop ACID and think about data – Bob Ippolito did a really nice survey of the various non-relational/non-transactional data storage options out there. Bob actually tried many of these, so the survey is useful for weeding out systems aren’t really ready for prime time. A must view if you haven’t been paying attention to this space.

  • Pinax: a platform for rapidly developing websites – I’ve been following Pinax via Twitter for some time now, and James Tauber and I were involved at the beginning of the Apache XML project almost 10 years ago. Despite all that, we’ve never actually met in person until this week. James had a tough job with his talk. Pinax is very new, so he could either talk for the people who didn’t know what Pinax is, or he could talk to people wanted to know where things were. James knew this was going to be a problem and said so in his talk. And it was, at least for me. Fortunately, I managed to sit down with James at the sprints and get my questions answered. Zed Shaw recently wrote a (very positive) review of Django. That’s interesting since Zed was a hard core Rails guy. It’s also interesting because he called out Django’s emphasis on modularity and Pinax as an example of that modularity. My questions about Pinax were mostly about what (if anything) Pinax has done to build on the modularity provided by Django. At the moment, the various Pinax components cooperate mostly via conventions. Things are still early in Pinax, and I wasn’t surprised to hear this. James did say that some conventions were close to getting codified/documented/supported by the framework, which is what I am really interested in. In some ways, the data representation and modularity problems are similar to the kinds of problems that we were trying to solve for Chandler. Pinax is in the social application domain and Chandler is in the PIM domain, so while there are some similarities there are also differences. I’ll definitely be sticking my nose a bit deeper into the Pinax checkout that’s been sitting on my hard disk.

The most entertaining talk that I attended was Ian Bicking’s Topics of Interest. Ian took the invitation to speak on something of interest quite literally which created an air of mystery. In the end, Ian prepared some slides (some of which were quite thoughtful and introspective), used an instance of the new Google Moderator to queue up some audience questions, and created an IRC backchannel which he kept on the screen during his talk. The result has to be watched (and the video is already up) to be understood. It was quite hilarious, with the exception of some unpleasant commentary after someone in IRC asked “why aren’t there more women at PyCon”. The resulting IRC conversation only serves as an explanation for why. Many people felt this way, and discussion of this spilled out into Twitter, and I hope that perhaps we can change things for the better.

I gave my talk, Challenges and Opportunities for Python, and got a pretty good reception. I had a number of hallway and other conversations with people based on the content. I think that I was successful in giving people a perspective on the dynamic language world as a whole, on Python’s place in it, and some things that we might be able to do in order to grow. You can watch the video and make your own assessment, and decide if there are actions worth taking.

This year the conference is benefitting from a great new website (built in Django), and you’ll find the slides and video for each talk on the links. The video team is doing a great job of cranking out the video, so all of them should be up soon, or you can go to pycon.blip.tv to see them all together. Here are some talks that I am going to be checking out:

The Lightning Talks

PyCon 2009

I put the lightning talks in a separate category from the talks because they are a phenomenon at PyCon. This year there were two lightning talk sessions, one at the beginning of each day and one at the end of each day. That’s 6 sessions of lightning talks! Jacob Kaplan-Moss only allowed signups for the next session, and it was truly first come first serve (without last year’s arrangement with the sponsors). There were a number of really good lightning talks. There really isn’t a good record of what got presented except perhaps on Twitter. A search for #pycon should get most of it.

Update: the lightning talks were also video’ed and will be posted on pycon.blip.tv

The Sprints

The PyCon sprints remain a phenomenon. While I don’t think quite as many people stayed this year as last year, there were still a lot of people — enough to fill the basement conference rooms at the Crowne Plaza hotel, and enough to need one of the ballrooms to serve lunch and dinner in. Once again, I hung at the Jython sprint, and wandered in and out of the Django and Pinax sprints. During the two days of sprints that I stayed for, I observed the folks working on ctypes for Jython actually crashing the JVM. SQLAlchemy started to really run on Jython and so did Twisted. Four days of hacking with the core developers of a project generally tends to produce results. So does spending time to bring new people from the community into your project.

I reported a bug in Django as I tried to get buildout setup to do Django on MySQL. I’m talking about Python and MySQL at the MySQL conference in a few weeks, so I was working on my example code. Turns out that MySQLdb doens’t build cleanly on the Mac. The trunk version almost builds cleanly, so I used that, but that version chokes something in Django. Before I discovered that I had done some gymnastics involving a git-svn clone of MySQLdb, a push of that to github, and a git recipe for buildout. I never quite got the git/buildout part working and I decided that it was overkill and that’s when I finally discovered that the trunk didn’t work with Django.

Of course, the sprints are also a time to catchup with/meet people in the community. It’s a time when there are friendly rivalries, joking, and alcohol. One of the momentous occasions during 2008, was that Django got a pony.

The exuberant Django people decided to bring the pony to PyCon…

PyCon 2009

Guido decided that he wanted the pony…

PyCon 2009

This all made for great fun and entertainment, which then spilled over into the sprints as a three way Python Core/Django/Pinax feud, which lead to things like this and this. This is hard core fun, people.

Overall Conference Commentary

The organizers estimated the attendance for this year’s PyCon at around 900 people. That’s a slight decline from last year, but the economic situation is much much worse than it was last year. I think that a 10% decline is a huge success, and a testament to the growth of interest in Python and it’s surrounding ecosystem.

From an organizational point of view, PyCon is continuing its tradition of being a mostly volunteer organized conference. It this respect it is a tour de force, at least in the space of open source conferences. PyCon is using a production company to assist, just as ApacheCon is, but the on site footprint of that company is much smaller than the on site footprint of the company for ApacheCon. Moreover, the number of volunteers helping with things is just enormous. Session chairs, runners to escort speakers from the green room to their sessions, a web site builder, lightning talk coordinator, open spaces coordinator, greeters at the conference desk, photographers, and I’m sure there are a bunch more people whose roles I didn’t even get to hear about. Absent a fancy lighted stage display for keynotes, production value wise, I feel that PyCon is operating at the same level of quality as any of the O’Reilly conferences. The program was excellent – tutorials, keynotes, invited talks, regular talks, open spaces, and lightning talks.

PyCon 2009

With PyCon, the Python community is getting way more mileage out of its face to face time than any other open source community. The combination of lightning talks, open space, and sprints creates a powerful feedback loop within the conference proper, which then extends into the sprint days. This dynamic has evolved over the years as PyCon attendees have come to understand the role of these vehicles. Here’s how it works:

PyCon 2009

The lightning talks allow anyone, regardless of stature, influence, or reputation to get in front of the entire conference. People now recognize that some of the most interesting, surprising, and entertaining moments of PyCon take place during the lightning talks. It’s a measure of the influence of the lightning talks that even the 8AM morning lightning talk sessions were well attended. At other conferences the morning sessions are reserved for keynote presentations by paying sponsors. I usually skip these because the content value is low. But I definitely got up to make sure that I hit those 8AM lightning talks. If you’ve gotten in front of the community with a lightning talk, you can extend your reach by scheduling an open space session.

PyCon 2009

Above is a shot of the open space board for Saturday. Note that the time slots go from 10AM to 10PM. There were a few prank type sessions, but for the most part, that board really is full all day long with 10 rooms available during each one hour time slot. Consider that there were 4 ballrooms for the talks, and that the talks went from 10:20AM till 5PM. There was way more air time in the open space sessions, and people certainly made use of it. This is why PyCon is a working conference – it’s not only about transfer of information, real work gets done there.

PyCon 2009

The only tricky thing with open space is that it would be great to have electronic access to the contents of the open space board during the conference. That would help make the open spaces a first class citizen in the minds of attendees. This is an interesting problem, because part of the value of the open space is the physical board, so turning it all electronic wouldn’t be a good idea. I wonder if Kaliya Hamlin has an experience with this sort of thing.

Used well, the open space sessions are great for organizing your little (or big) slice of the world wide Python community. They are also great as a prelude to a sprint once the conference has finshed. And as I’ve already mentioned, the sprints are a great time to reinforce a project’s community as well as move it forward.

PyCon 2009

All of this notwithstanding, the PyCon organizers are not sitting on their laurels. They keep on looking for ways to improve the conference. The buckets you see above are an example of this. Instead of paper or electronic surveys, attendees were asked to vote for talks by taking a red chip and tossing it a bucket on their way out the door. Green for good, yellow for ‘meh’, and red for bad. This is way less effort than the surveys, and I observed a decent number of people putting in their chips. Doug Napoleone has more on the origins of this system, as well as a pointer to the raw data on the results.   

Twitter is now in the mainstream at PyCon. Guido mentioned Twitter during his keynote, and used it to ask questions during the conference. One of James Tauber’s first slides told people which hashtag to use when covering his talk. I’d guess that I got at least 20 new followers each day of PyCon, and I think that I might even be trained to use hashtags now. #pycon was in the top 10 Twitter during the days of the conference. The takeway is that if you are going to a conference and you are not on twitter, you are missing out. The corollary is that if you are a conference, and you aren’t making use of twitter, you need to pay attention. Ian Skerrett has an interesting post on how they used Twitter during EclipseCon. One thing that was missing was a video display of the search for #pycon. I know from talking with Doug Napoleone that he has some wonderful ideas for taking all the social networking stuff to the next level. I’m really looking forward to seeing that next year.

Photography

I’ve been to a lot of conferences over the last few years, always with a camera in hand. At each conference I shoot less and less. There are now lots of people swarming around with cameras, and I feel a bit done out with shots of people speaking from the front of a room, rows of white male attendees listening to a talk, and the rest of the usual conference shots. The same thing happpend with me and liveblogging conferences. Also, it’s hard to do the hallway track and do decent photography.   Last year, the PyCon organizers asked me to take some official pictures, which I was happy to do. This year they didn’t (which was fine by me), but I had planned to bring the camera anyway, because PyCon is PyCon, and photographing there is one way that I try to give back to the community.

It turns out that the organizers were way more organized about photography this year. They actually had someone to coordinate the photography for the conference. Steven Wilcox had a last minute emergency and couldn’t make it. I found out about all of this just a half an hour before I left for the airport. Steven had planned to do headshots of Pythonistas, and was planning to get studio lighting equipment and so on. All of that was now up in the air. Since I had done a bunch of headshots of ASF people at ApacheCon, I tossed some Strobist lighting gear into my suitcase, just in case. By the time I landed in O’Hare, Erich Heine had stepped up to replace Steven, and I joined the “Python Paparazzi” or “pyparazzi”, along with Erich, Jason Samsa, Dan Ryder, and Stéphane Jolicoeur-Fidelia.

PyCon 2009

Since PyCon was in Chicago last year, I was familiar with the Crowne Plaza Hotel, which is a decent hotel, but nothing to write home about. This year the conference proper moved to the Hyatt Regency down the street. PyCon has a tradition of trying to keep costs low in order to keep the conference accessible to the community, so I was expecting something like the Crowne Plaza. I couldn’t have been more wrong. The Hyatt is a photographer’s paradise. There are lots of interesting colors, textures, and some areas with beautiful overhead natural light. If you were going to photograph a wedding, you would die for settings like these for the bridal portraits.

PyCon 2009

This tiled inset in wall turned into the backdrop for James Tauber’s headshot.

James Tauber

It doesn’t have to be strobe(ist) to be a good headshot!

PyCon 2009

This orange lit panel behind a bench seat turned into the backdrop for Jim Baker’s headshot.

Jim Baker

In addition to the pyparazzi, there were plenty of other cameras floating around the conference. Andy Smith decided to do a photographic project called the “Beards of Python“. When this set was announced on Flickr, it caused some Twitter buzzing amongst some of the female attendees of the conference. One thing about photographers is that we (or at least I) are always willing to take some interesting photos. So when the Twitter buzzing reached me, I offered to photograph any interested Geek Girls. James Duncan Davidson and I have discussed the value of trying to photograph female attendees at technology conferences. Since our photographs are often used for advertising, this can be a way of helping women feel more comfortable about attending — knowing that there will be other women there can be a help. So not only did I get to shoot more pictures of interesting people, I hope that in some small way this will contribute to making PyCon friendlier to women.

Catherine Devlin

This is Catherine Devlin, a contributor to sqlpython. Go read her post “Five minutes at PyCon change everything” for an actual example of the lightning talk/open space/sprint scenario that I described above.

The entire set of Pythonista headshots, as well as the rest of my conference coverage are up on Flickr. Who knows what we’ll come up with for next year in Atlanta…

Travel

Regular readers will know that a trip to PyCon traditionally involves some kind of travel mishap. This year was pretty minor compared to previous years.   United lost my luggage for the flight from Seattle to O’Hare, despite the fact that I arrived 2.5 hours early, and checked in at the “Premier” checkin line. I got my bag the next day, so it wasn’t really that bad. Maybe next year will be the PyCon with no travel glitches.


Refactoring in the Functional Programming world

I’m an Emacs guy. I was first exposed to Emacs back in 1984 on a VAX running BSD. This was prior to GNU Emacs, so the Emacs that I saw was James Gosling’s Emacs. At the time, I was working on a compiler for a functional programming language called SUPER, which was evaluated using combinator graph reduction.

For many years, and across many languages, including Scheme, C, C++, Perl, TCL, and Java, Emacs was my tool of choice. My hands had the muscle memory for the keystrokes, and over those years I accumulated a file full of Emacs-Lisp customizations for Emacs (by this time, mostly GNU Emacs). When Eclipse started to support refactoring I started using Eclipse as my primary tool for editing Java programs. Refactoring is an example of the kind of high leverage features that I want in my programming tool set.

A few days ago I found some gems buried in a thread on the Scala mailing list. Dave Griffith has been accumulating a list of refactorings for Scala. Here’s his complete list:

Curry Method (split a parameter list, and the arg lists of all callers).

Uncurry Method (merge split parameter list, including merging the arg lists of callers. If method is called with partial args, either complain or automatically create a helper method which represents the partial application, and replace partial calls with it.)

Extract Trait (including searching for other classes which can have the same trait extracted. Tricky with super calls, but not impossible)

Split Trait (splits trait into two traits (putting in self-types if needed), change all extending classes to extend both traits)

Extract Extractor (select a pattern, automatically create an extractor)

Extract Closure (similar to extract method, but creating a function object)

Introduce by-name parameter

Extract type definition (obvious)

Merge nested for-comprehensions into single for-comprehension (and converse)

Split guard from for-comprehension into nested if (and converse)

Convert for-comprehension into map/filter/flatmap chain (and converse)

Wrap parameter as Option (converting null checks, etc.)

Convert instanceOf/asInstance pair to match

Replace case clause with if body to guarded case clause(s)

I was particularly interested in those refactorings related to functional/higher-order programming and pattern matching. Between the surge of interest in Scala, F# and Haskell, it looks like there’s room for some more work in refactoring.

The First Annual JVM Language Summit

You know that a conference is good when you go home with a list of stuff that you never heard of but now need to go follow up on. The JVM Language Summit was exceptional in this regard. Sun provided a location and a few of the speakers, but most of the speakers at the Summit were not Sun employees, although there were a few Sun alumni amongst the speaker ranks. The topics that were discusssed went all the way from type theory (including the usual greekified type proofs), typical language design stuff, VM design, all the way down to discussions of how high allocation rates can cause hot data to get flushed out of caches on the bare metal. Slides for all the talks are available on the wiki for the conference, and some of the talks will have video at either InfoQ or YouTube. Here are some of my favorites from the three days.

Clojure

I’d been aware of Clojure prior to the summit and had looked at the page on Clojure’s use of persistent data structures, so I thought that I had some idea of what was interesting about Clojure. I was wrong. Rich Hickey’s 30 minute presentation on Clojure had a really large amount of information per unit time. By the end of the time I was really interested in Clojure, and I was able to find out a bit more about it by going to an open space session and by being at the same table as Rich during dinner one night. As and old Lisp guy, my usual reaction to Lisps on the JVM or CLR is why? They don’t typically fit in that well with the host VM, and there are great implementations of Common Lisp that can compile to very efficient machine code. I was looking forward to Arc, but that has turned out to be very disappointing. Clojure has taken a very practical approach to the Lisp parts of the language. It fits in very nicely with the JVM, is able to call Java code easily and has the potential to achieve very good performance on the JVM. Also, Rich has made a number of design decisions which improve the syntax (he showed a short program in both Python and Clojure, and they occupy the same amount of vertical space and have roughly the same visual density) of Clojure. He’s also generalized many operations that would have worked on lists to work on sequences, which really means any Java sequence type. Like many Lisps, you can supply data type hints, and the compiler will use those to make the program more efficient. There is a nice library of collection operations, which look very comparable (or better) than Python or Ruby’s facilities for collection types. There are some other really interesting data structures in the libraries, like bit partitioned hash tries and zippers.

Beyond the Lispish stuff in Clojure, there are a several interesting features in Clojure related to the problem of concurrency. In Clojure things are immutable by default, which is a huge benefit – a benefit shared by functional languages, and quasi functional languages like Erlang. Beyond that, Clojure supports persistent data structures as a way of managing state in a concurrency friendly manner. The idea is that “updating” a persistent data structure yields both the version before and after the update. This means that updates don’t impede readers of the old version, and are not blocked by readers of the old version.

Lastly, Clojure provides an interesting mechanism for utilizing Software Transactional Memory. Normally STM systems make all accesses to memory transactional. This makes the STM a bottleneck, and makes it much more likely that the performance of the STM system will be the limiting factor in a concurrent system. Clojure requires you to make uses of the STM explicit via its Ref structure. This yields a potentially much more controlled usage of the STM, which could help preventing the STM from being a bottleneck.

My original impression of Clojure was that it was still in the very early stages, but it seems to be bit further along that that. I was surprised by the size of the community, and by other parts of the ecosystem, like the tool support. There are several Emacs modes, integration with SLIME, and even a Netbeans plugin for Clojure.

I will definitely be giving Clojure a closer look, and I am not alone. There was a lot of energy in the room during and after Rich’s talk, and there was a burst of Twitter traffic during the talk. It’s pretty interesting to see the number of language geeks on Twitter.

Davinci Machine

If you’ve been following John Rose’s blog and Charlie Nutter’s recent writings on invokedynamic, you wouldn’t be very surprised by the content of John’s presentation on the DaVinci Machine Project. This is a highly important piece of work for non-Java languages on the JVM, so it was good to hear John tell a more complete version of what he’s been up to. It was also my chance to meet John in person. We somehow missed each other at JavaOne, so it was good to put a face to the name, and have some in person contact. John and Brian Goetz did a great job of organizing the summit, and John was always trying to find out what kind of features would be useful to JVM implementors. JSR-292 can’t happen soon enough.

Fortress

David Chase talked about the work that folks at Sun labs have been doing on Fortress. I never really paid much attention to Fortress, since they are really aiming at the scientific, high performance computing space, and that’s kind of outside of my interests. The Fortress guys are doing some interesting explorations as far as concurrency is concerned. In fact, David referred to Fortress as “infested with parallelism”. My todo item from the Fortress talk has to do with the work-stealing model that they have for concurrency. Apparentl this work is based on a data structure known as an ABP queue, so I’ll be tracking down the paper on that one.

JRuby

I’ve heard Charlie Nutter talk about JRuby several times, and have talked with him a little about JRuby. Even so, his talk on JRuby was really interesting, because he was able to go full out for an hour to a very sophisticated audience. I know from talking to some of the Jython guys, that there were a few aha moments for them, even though they’ve been to the same talks that I’ve been to.

Dynalang MOP

Attila Szegedi described his proposal for a MOP for dynamic languages. Once you start hosting a bunch of languages on the JVM (or any VM), then people start to ask if they can call code written in language A (say Clojure) from language B (say Python). The tough part is that the code in language A may have compiled to Java bytecodes in a way that doesn’t really resemble Java code, and you can end up in a situation where B can call A but does it by grabbing things which are really artifacts of A’s implementation. Of course, A’s implementors will continue to improve A, and in the process of doing so, might change the details. You can see what the problem is going to be. Attilla’s MOP would go a long way towards helping here. I hope that people will give it a serious look.

Gradual Typing

Jython committer Jim Baker has been after me about the work that Jeremy Siek (UC Boulder) has been doing on adding types to Python. His system is called gradual typing and allows a programmer to selectively add type annotations to a program. It’s a cool idea in principle, and I hope that it will end up being cool when it finally gets implemented all the way. I have to admit that the first time that I saw an annotated program, I had a violent reaction. There were a ton of angle brackets due to the type annotations. Jeremy and his students are working on ways the reduced the amount of notation that is needed. I hope that they’ll be successful — in Python at least, it’s going to be key to whether people will adopt it or not.

Fundamentalist FP

I’ve been an admirer of Erik Meijer‘s work for some time, so I was glad to be able to hear him speak in person. There was another talk on LINQ, so Erik didn’t talk about that. Instead he talked about what he called “Fundamentalist Functional Programming”, which is really just the functional programming that the old school functional programming people have always talked about. I think that Erik is concerned about the amount of lifting of functional programming ideas and idioms, without a full understanding of the essence of functional programming. His presentation style is very entertaining. The major thrust of his argument was that for the past 50 years of computing, we have been abstracting, but abstracting over the wrong things. He asserted that the thing that we really need to do is to abstract over evaluation order. Given the coming many/multicore world, this is understandable, but don’t think that I agree that all the other lessons that we’ve learned about abstraction are invalid. He provided the simplest explanation of monads that I have ever heard or read, as well as showing how to handle things like object creation and process creation monadically. In the end, though, his talk reduced to the essentials of lazy pure functional programming.

Bytecodes for fast JVM’s

Cliff Click asked that JVM language implementors send him an implementation of a particular program written in their language. Cliff then ran those programs (in their respective languages) on Hotspot and on the Azul JVM. His talk was a report of his findings as to what was keeping various languages from getting the best results on the JVM. He said that he wasn’t trying to compare the merits of one language versus another, but more to give the implementors insight into what was up with their code. I found this talk to be tremendously interesting because Cliff really knows the guts of HotSpot and because he was able to be very specific about what was causing problems for the various languages.

Parrot

I’ve known Allison Randal for several years now, mostly via her organizing of the FLOSSFoundations meetings that happen every year at OSCON. In all that time, we’ve never really sat down and talked about her work on Parrot, and it’s been several years since I heard a talk on Parrot. I give John and Brian a lot of credit for inviting Allison to come and talk about Parrot. The architecture of Parrot is very unlike either the JVM or the CLR. They started with very different assumptions and goals, which unsurprisingly lead to a different design. As far as I can tell, Parrot is looking reasonable on the performance front, will be able to use the C libraries of Python, Ruby, PHP, etc, without much hassle, and will have a good story for interoperability between hosted languages. Control flow is modeled using continuations which means that continuations are really cheap to create. Allison also talked about what a different method of doing call site caching – Parrot does the caching in the Parrot class object, not in little caches strewn all over the call sites. This makes it easy to invalidations the cache when the class hierarchy changes, for example. I’m still trying to digest all of what I heard, as well as the conversation that several of us had with Allison after her talk.

The Parrot team has been lying low and working away on Parrot, and they are definitely making progress. Allison showed some very preliminary benchmarks of the incomplete Python, Ruby, and PHP implementations on top of Parrot versus the C based versions. She told me afterwards that the project has reached the point where they are working to time based milestones, and that they are hoping to do a 1.0 release early in 2009. Chalk up another to-do.

Random Thoughts

There aren’t any pictures to go with this report because I was not motivated to take any. There were several people snapping away quite frequently during the conference, and I didn’t want to add the slaps from the D3 to the cacophany and the flash light show.

It seems clear to me that many folks share some of the same problems, and I hope that on result of the summit will be that people will start to work together when it makes sense. I know that the Jython and JRuby folks are working in that direction, and it seems likely to me that there will be some collaboration around the dynalang MOP as well. There was a lot of good energy in the room: people were very respectful and curious about other people’s work.

I think that the only regret that I had was that this was the first annual JVM Language Summit. Imagine where we’d be if this had been the fifth…

Update: finished the sentences about persistent data structures in Clojure

News sweep

I’ve managed to go the entire month of August without a post, due to a combination of travel, family activities, and vacations. So here’s a sweep of some of the things that I would have covered during that time.

1.0’s

The Chandler Project – Chandler has gone 1.0, so if you were put off by the version number, you can take it out for a spin. There are some good posts on the Chandler blog that describe how people are using it.

Django – Just today, the Django project had its 1.0 release. This is pretty important because there were a lot of changes in the subversion trunk that weren’t in the packaged builds. That’s all be done away with now. I expect that this will lead to even more Python webapps.

Tools

DTrace – DTrace is 5 years old today, and Bryan Cantrill has a good war story from that time. It’s amazing to me that something as good as DTrace can be around for 5 years, and still be relatively unknown. If you are on Solaris, OpenSolaris, or Mac OS X, go check it out.

Ubiquity – Ubiquity is like Quicksilver integrated into Firefox. It’s emphasizing the natural language aspects of that kind of interface. There’s also pretty good documentation on how to build additional commands, which is really important. There are extensions for Quicksilver, but there aren’t a lot of them. There are already a lot of third party Ubiquity commands. I really wish that Ubiquity could talk to other applications besides Firefox, but there are pretty nasty security problems down that path. Some of the commands are very Google oriented, like the mail and calendar, which makes it less useful for people like me who are still using desktop applications. In an event, I think that this is worth watching carefully. One unintended side effect might be additional pressure for page/application authors to embed machine-readable content (yes, that you, microformats, at least in part) into more pages. We’ll see.

Chrome – There’s a lot of buzz about Google’s Chrome browser. Since it doesn’t run on the Mac, I don’t have much to say. I’m not about to install Windows or fire up VMWare just to run a browser. One day the Mac port will be done, and then I’ll have a look. I am encouraged that the development team is doing a real Mac native experience.

Dynamic Language Runtimes

It’s been exciting to watch the progress in JavaScript runtime engines over the last few weeks. First there was Mozilla’s TraceMonkey, which is a tracing based JIT, which delivered some very impressive speedups, despite the fact that it still has cannot deal with recursion. As part of Google Chrome, a team lead by StrongTalk/HotSpot lead Lars Bak has done a JavaScript JIT called V8, which is also turning out some very impressive numbers. And of course, the SquirrelFish engine for WebKit was turning in pretty good numbers a few months back. This is great progress for JavaScript — it’s less so for the web because of the variety of deployed browsers. It’s exciting to watch the various JavaScript runtimes leapfrogging each other. It gives me the sense that JavaScript is really making some serious moves on the performance front. Of course, none of these folks are comparing their execution times to C or C++. I’d like to see those comparisons as well. It’s also great that all three of these engines are open-source, so that implementors of other languages can evaluate the internals of these VM’s. I’d love to see this kind of leapfrogging in the Python and Ruby communities.

Cameras

I’m not as interested in the camera body arms race as I once was. The Canon 50D is an upgrade of the 40D, but I’m not really sure that more pixels is better. The telltale feature on the camera is the autofocus system, which hasn’t been given much of an upgrade. That signals to me that the 5DMkII will not be the all out upgrade that many are hoping for, but what do I know? The Nikon D90 sounds cool if you want to shoot video. I have enough problems with still pictures.

I am interested in the Nikon P6000, point and shoot, but I am seriously annoyed by the NRW proprietary RAW file format for the camera. Everything about the camera seems awesome, especially the ability to do off camera flash, both iTTL and manual. The RAW thing is going to be the determiner for me. I won’t buy one unless there is Lightrooom/Adobe Camera Raw support for the camera. OS X native support wouldn’t be bad either. As a new Nikon owner, I am unimpressed by the NRW decision.

Travel

September is a heavy travel month for me. I will be in Birmingham, UK for PyCon UK, from Sept 12-14, and I’ll be at the JVM Language Summit from Sept 24-26. As always, stop by and say hello if you will be at one of these events.

DTrace on Linux?

I’ve been meaning to write a post about DTrace, and Tim Bray’s tweet finally got me moving. It looks like some people are trying to make DTrace a topic for this year’s Linux Kernel Summit. I hope they succeed. I also hope that those folks pushing for user level tracing have their voices heard. I was amused to read one of the messages which claimed that DTrace is:

DTrace is more a piece of sun marketing coolaid which they use to beat us up at every opportunity.

My experience at Sun thus far is that people generally don’t really appreciate the benefits of DTrace. It stems from a view that I also saw in the LKS threads, which is that DTrace (and tools like Systemtap) is a tool for system administrators, because it reports on activity on the kernel. That’s not how I look at it. DTrace is a tool for dealing with full system stack problems, which initially manifest themselves as operating system level problems. The fact that DTrace can trace user land code as well as kernel code is what makes it so important, especially to people building and running web applications. Because of all the moving parts in a complicated web application (think relational database, memcached or other caching layers, programming language runtime, etc), it can be hard to debug a web application that has gone awry in production. Worse, sometimes the problems only appear in production. Tools which cut across several layers of the system are very important, and DTrace provides this capability, if all the layers have probes installed. When a web application goes wrong in production, you see it at the operating system level – high usage of various system resources. That’s where you start looking, but you will probably end up somewhere else (unless you are ace at exercising kernel bugs). Perhaps a bad SQL query or perhaps a bad piece of code in part of the application. A tool that can help connect the dots between operating system level resource problems and application level code is a vital tool. That’s where the value is.

One of the cooler features of DTrace is that you can register a user level stack helper (a ustack helper), which can translate the stack in a provider specific manner. One cool example of this is the ustack helper that John Levon wrote for Python, which annotates the stack with source level information about the Python file(s) being traced. On an appropriately probed system, this would mean that you could trace the Python code of a Django application, memcached, and your relational database (PostgreSQL and soon MySQL). That would be very handy.

I’d love to see DTrace on Linux, because I have it on OS X and it’s in OpenSolaris and FreeBSD, but I’d also be happy to see SystemTap get to the point where it could do the same job.

Notes on A History of Erlang

Joe Armstrong wrote a paper for last year’s HOPL-III conference on the history of Erlang. For some reason, I didn’t get a paper copy of those proceedings, and was too busy to notice their absence. Fortunately Lambda the Ultimate picked it up and supplied links to the paper and the accompanying presentation. Digging into the history of something like Erlang is always fascinating, and Armstrong has done a good job of explaining how Erlang came to be.

Here are a bunch of quotes on topics that I found interesting. I’ve grouped them into categories, but searching the PDF of the paper shouldn’t be hard if you want to know where they originated.

Sources of inspiration

Those familiar with Prolog will not find it at all surprising that Erlang has its roots in Prolog (mostly due to implementation reasons). What I did find interesting was the origin/history/viewpoint on the concurrency model

The explanations of what Erlang was have changed with time:

1. 1986 – Erlang is a declarative language with added concurrency.

2. 1995 – Erlang is a functional language with added concurrency.

3. 2005 – Erlang is a concurrent language consisting of communicating components where the components are written in a functional language.

Today we emphasize the concurrency.

Note that the word actor never appears in those descriptions. Indeed, the word actor does not appear in the paper at all. So for all the discussion about Erlang’s usage of the actor model, it appears that the Erlang folks independently duplicated many of the ideas for Hewitt’s Actors. I think that is kind of interesting.

Lisp and Smalltalk are cited as inspirations, but more for the implementation of the runtime than for any features in the language. I came away from the paper with the impression that Armstrong and his colleagues are not paradigm ideologues. They are trying to get the job done.

Reliability

There is a huge emphasis on reliability throughout the paper, supporting Steve Vinoski’s remarks about Erlang. I’l just include a series of quotes, which you can interpret as you see fit:

Erlang was designed for writing concurrent programs that “run forever”

At an early stage we rejected any ideas of sharing resources between processes because of the difficulty of error handling. In many circumstances, error recovery is impossible if part of the data needed to perform the error recovery is located on a remote machine and if that remote machine has crashed.

In order to make systems reliable, we have to accept the extra cost of copying data structures between processes and always make sure that processes have enough data to continue by themselves if other processes crash

The key observation here is to note that the error-handling mechanisms were designed for building fault-tolerant systems, and not merely for protecting from program exceptions. You cannot build a fault-tolerant system if you only have one computer. The minimal configuration for a fault-tolerant system has two computers. These must be configured so that both observe each other. If one of the computers crashes, then the other computer must take over whatever the first computer was doing.

This means that the model for error handling is based on the idea of two computers that observe each other. Error detection and recovery is performed on the remote computer and not on the local computer.

Links in Erlang are provided to control error propagation paths for errors between processes.

It was about this time that we realized very clearly that shared data structures in a distributed system have terrible properties in the presence of failures. If a data structure is shared by two physical nodes and if one node fails, then failure recovery is often im-possible. The reason why Erlang shares no data structures and uses pure copying message passing is to sidestep all the nasty problems of figuring out what to replicate and how to cope with failures in a distributed system.

In our world, we were worried by software failures where replication does not help.

Design criteria

Here are some quotes related the design criteria for Erlang.

Changing code on the fly was an initial key requirement

the notion that three properties of a programming language were central to the efficient operation of a concurrent language or operating system. These were:

1) the time to create a process

2) the time to perform a context switch between two different processes

3) the time to copy a message between two processes

The performance of any highly-concurrent system is dominated by these three times.

One of the earliest design decisions in Erlang was to use a form of buffering selective receive

Pipes were rejected in favor of messages

In the concurrent logic programming languages, concurrency is implicit and extremely fine-grained. By comparison Erlang has explicit concurrency (via processes) and the processes are coarse-grained.

The final strategy we adopted after experimenting with many different strategies was to use per-process stop-and-copy GC. The idea was that if we have many thousands of small processes then the time taken to garbage collect any individual process will be small.

Current systems run with tens to hundreds of thousands of processes and it seems that when you have such large numbers of processes, the effects of GC in an individual process are insignificant.

The BEAM compiler compiled Erlang programs to BEAM instructions.

On functionalness

This next series of quotes will probably make the pure functional language people shake their heads, but i think that it’s important to understand Erlang in contrast with pure functional languages.

Erlang is not a strict side-effect-free functional language but a concurrent language where what happens inside a process is described by a simple functional language.

Behaviors in Erlang can be thought of as parameterizable higher-order parallel processes.

… the status of Erlang as a fully fledged member of the functional family is dubious. Erlang programs are not referentially transparent and there is no system for static type analysis of Erlang programs. Nor is it relational language. Sequential Erlang has a pure functional subset, but nobody can force the programmer to use this subset; indeed, there are often good reasons for not using it.

An Erlang system can be thought of as a communicating network of black boxes.

In the Erlang case, the language inside the black box just happens to be a small and rather easy to use functional language, which is more or less a historical accident caused by the implementation techniques used.

History and Usage

One thing that I was looking for in the paper was more details on how long Erlang had been around (besides before Java), how big the largest programs/systems were, and so forth. Here is what I found.

This history spans a twenty-year period…

(The history starts in 1986)

The largest ever system built in Erlang was the AXD301. At the time of writing, this system has 2.6 millions lines of Erlang code.

The AXD301 is written using distributed Erlang. It runs on a cluster using pairs of processors and is scalable up to 16 pairs of processors.

In the analysis of the AXD reported in [7], the AXD used 20 supervision trees, 122 client-server models, 36 event loggers and10 finite-state machines. All of this was programmed by a team of 60 programmers.

As regards reliability, the AXD301 has an observed nine-nines reliability [7]—and a four-fold increase in productivity was observed for the development process [31].

The AXD 301 is circa 1998.

Perhaps the most exciting modern development is Erlang for multicore CPUs. In August 2006 the OTP group released Erlang for an SMP.

This corroborates something that David Pollak told me at the RedMonk unconference during CommunityOne, namely that SMP support in Erlang had not been there very long. Of course, Erlang was running on systems with 16 physical (pairs, no less) of processings in a distributed environment. So while the runtime might not be that mature on SMP, the overall runtime for concurrency is probably a bit more mature than that. Nonetheless, worthwhile to know the precise facts.

All in all, I found the paper to be a very worthwhile read – (and a nice change from my usual intake of blog posts and tweets). One of my pet peeves about the computer business is the lack of awareness of the history of the field. At least I’ve removed a bit of my own ignorance as relates to Erlang.

The Scala vs Erlang whirlwind

Over the last week or two there’s been a bit of commotion with various parties in the blogosphere making the case for Scala against Erlang or for Erlang against Scala. Here’s a see spot run summary of the main writers and their positions / content:

Ted Neward (1, 2) – Ted (how confusing) is in the Scala camp, and thinks that the library approach of Scala’s actor library is preferable to Erlang’s VM (BEAM). He cites managability as a major concern. He also thinks that adapting a process style model into the JVM would be easier than adding SNMP monitoring to BEAM. The length of the Barcelona project bibliography suggests otherwise, but we’ll never know unless some brave soul goes and tries to do this. Fortunately, the JDK is open source now. One has to wonder whether such a change could make its way through the JCP, though. Unfortunately for Ted, I found that many of his arguments were weakened by his lack of knowledge about Erlang.

Steve Vinoski (1, 2, 3) – Steve’s articles are more about the reliability aspects of Erlang, and he’s mostly trying to correct Ted’s facts on Erlang. He thinks that Erlang had proven its reliability chops by running for years non-stop. Given the frequency with which Java app servers need to be (or are) bounced, this doesn’t seem that incredible to me.

Patrick Logan (1, 2, 3) – Patrick piled on after Steve and has spent most of his writing trying to correct/challenge Ted’s assertions about Erlang. Patrick thinks that the conventional (i.e. JVM and CLR) runtimes will have problems implementing an Erlang style shared-nothing model, since the pre-existing libraries for those runtimes are not engineered in a shared-nothing manner.

Barry Kelly was an observer of the Neward-Vinosk-Logan discussion, and added some commentary on the impact of VM primitives on things like lift. This is a point which resonates with me, because it seems to me that both languages and language runtimes will need some work to meet the challenges of large scale multicore computing.

Yariv Sadan has done a pile of stuff in Erlang, and supplied his own summary of the differences between Scala and Erlang. There is a very informative exchange between Yariv and lift author David Pollak in the comments of this one.

That’s the short rundown. This is a very interesting problem space — before I turned into database programming language guy in graduate school, I was angling to be a concurrent programming language guy. Along the way to that I got pretty good doses of functional and logic programming, as well as actor programming. That work was in the context of people planning to build (for the day) highly concurrent computers, on the order of 1000’s of processors. Today, multicore hardware is not quite up to that level, but it is approaching it pretty quickly. If there is any force in computing that is likely to precipitate the need for a new programming ecosystem (language, runtime, libraries), I think concurrent programming is it. I also think there is just not enough experience with this problem to have a real sense of what is really going to work. Cliff Click and Brian Goetz were right when they said that we just don’t have a good programming model for this stuff. Absent a model, I don’t know how we can think that we really understand what the runtime needs to deliver.

Scala liftoff

I stayed around in San Francisco for one more day after JavaOne, in order to attend the Scala liftoff. The liftoff was an open space style conference (which has a more specific meaning than “unconference”, at least to me). My friend Kaliya Hamlin did a great job of facilitating the day.

Scala liftoff 2008

Scala has steadily been gaining attention, and hasn’t yet hit (at least in my eyes) the hype part of the classic Gartner hype cycle. I’ve been poking about with Scala, mostly because of the type inferencing, the Actor library, and lift. I have great respect for the work that Martin Odersky has done over the years, which also has me interested. Couple that with what I learned about closures in Java at JavaOne, and the list of reasons to look more deeply at Scala is getting long, especially if you are determined to have a statically typed languages.

Scala liftoff 2008

I wasn’t able to make it to any of sessions on lift. It just worked out that other sessions overlapped them in a pathological way. While this is unfortunate, I am sure that I’ll be able to pick up anything that I need from the mailing lists and other documentation. I was able to attend two sessions on actors. One of the sessions had people with questions about actors, but no Scala actor experts were in that group. There was some discussion of Pi-calculus and the join calculus, but no discussion of the actual actor theory.

Steve Yen’s session on actor-d was pretty useful. Steve set out to build a version of memcached using Scala’s actors. He spent most of his slot talking about Scala/Java isms that he ran into – this was important since he was comparing to the C memcached. By the time he got to the actor related stuff, he was almost out of time. Steve found that he had to remove actors from the main loop of his server in order to get sufficient performance. He wanted to get statistics from the server in the background and discovered that he main loop actor was always processing messages and was never idle long enough to report statistics. He ended up replacing the actor with plain old Java Threads (POJT?). This was in addition to all the fact that he ran into many of the standard Java problems as well. I’m not sure what to conclude from this. I don’t recall what kind of hardware he was on, and I am not convinced that he had the right architecture for an actor based system. Some of his experience also seemed contrary to what the lift folks have been claiming. I think that we are in for a decent amount of investigation here. One of Martin’s statements about Scala is that it is possible (and better) to extend the language via libraries than via actual language constructs. For the most part, I agree with this, but there are certain extensions which have interactions with the runtime – like concurrency. In those cases, I don’t see how the library approach allows taking advantage of runtime features. The current version of Scala actors is implemented as a library.

One of the things that I am currently working on is support for Python in NetBeans, so I dropped into the session on IDE support for Scala. With the exception of IntelliJ, none of the IDE plugin principals were present, so it was hard to have a really productive discussion. Martin did attend the session and we talked about the possibiliy of getting hooks into the existing Scala compiler, particularly the parser and the type inferencer. That could yield some big dividends for people working on IDE support. One IDE feature that I would like to see is the ability to hit a key, and have the IDE “light up” all the inferred types, overlaid on the existing program code. This would allow developers to see if their intuition about the types actually matched that of the type inferencer. I’d like a feature like this for Python/Ruby/Groovy/Javascript code as well. Further discussion was deferred to the scala-tools mailing list.

Scala liftoff 2008

The other session that I participated in was the session on Scala community and governance. Several people wondered about this during Kaliya’s “What questions do you have about Scala” portion of the schedule building. When nobody else put up a session in this area, I grabbed a slot, hoping to spur some conversation – if for no other reason than my own education. Fortunately, Martin had already been thinking about the problem. He is going to adopt a Python style governance, with him (and EPFL) having the final say on language design matters. There will be Scala Enhancement Proposals (SEPs), like the Python PEPs. I’m very happy with this. I think that Python has done very well at maintaining the balance between (lots) of community input on the language design, while still retaining that “quality without a name”. One of the things that I said during the CommunityOne general session panel was that particular individuals in the right place, at the right time, matter at great deal. After watching Martin for the day, and seeing his interactions on the mailing list over the last few months, I think that the design of Scala is in very good hands.

We also talked about the evolution of the Scala libraries. The Scalax project is working to build a set of utility libraries for Scala. Martin views scalax as a place where anyone can submit a library, have it tested, vetted, reworked, etc. Eventually some code in scalax would be candidates for addition to the Scala standard libraries. This also seems like a sane approach to me. I like the idea of having a place for libraries to shakeout before going into the standard libraries. Martin also mentioned a LINQ in Scala project. I need to track that one down too.

It is good to be in a multi-language world again. There’s room for Scala, Python, Ruby, and others. Another language that I am keeping my eye on is Newspeak.

The Open Screen project

Around this time last year, Adobe open sourced its Flex framework for rich internet applications. Today Adobe announced the Open Screen project, which encompasses a number of things, probably most importantly, the removal of the license restrictions on the SWF file format used by Flash. The other aspects of the announcement relate to Adobe’s Flash Player, and while they are steps towards openness, Adobe’s player will remain closed. The importance of opening Adobe’s player has decreased because dropping the file format licensing should make things easier for the Gnash folks. The worry then is that we’ll end up with incompatible versions of Flash, which is in almost nobody’s interest. That’s probably the next problem that needs addressing.

Python at CommunityOne

CommunityOne is a free and open developer conference that is run by Sun on the day before JavaOne. This year, there will a space at CommunityOne dedicated to the Python community, complete with whiteboards and wifi. If you are in the Bay Area for JavaOne, or in the Bay Area, or just plain interested in Python, please register for CommunityOne — space is limited.

Registering for CommunityOne gets you a bag of swag, a free lunch the day of CommunityOne, access to all the CommunityOne events and sessions, and a free pass for Day 1 of JavaOne. When you register, put “Python/Jython” in for the referral code.

I will be on a panel on community models during the general session from 9:30AM – 10:45AM, and Frank Wierzbicki and I will be doing a Python/Jython panel. In addition to the usual developer stuff, there will also be a two day Startup Camp, and the folks from RedMonk will be back to do their day long unconference thing.