Archive for the 'programming' Category

Python at CommunityOne

CommunityOne is a free and open developer conference that is run by Sun on the day before JavaOne. This year, there will a space at CommunityOne dedicated to the Python community, complete with whiteboards and wifi. If you are in the Bay Area for JavaOne, or in the Bay Area, or just plain interested in Python, please register for CommunityOne — space is limited.

Registering for CommunityOne gets you a bag of swag, a free lunch the day of CommunityOne, access to all the CommunityOne events and sessions, and a free pass for Day 1 of JavaOne. When you register, put “Python/Jython” in for the referral code.

I will be on a panel on community models during the general session from 9:30AM - 10:45AM, and Frank Wierzbicki and I will be doing a Python/Jython panel. In addition to the usual developer stuff, there will also be a two day Startup Camp, and the folks from RedMonk will be back to do their day long unconference thing.

Dynamic language jobs?!

Wondering who’s getting jobs working in a dynamic language? Wondering which language? Here are two takes on that question, one from SimplyHired, and one from Prescient. Clear as mud.

PyCon 2008

It’s been 2 years since I’ve been to PyCon, and things have definitely changed. The last time I went to PyCon (2006 in Dallas), it was still a relatively small conference (3-400), if I remember what I was told), with a familiar feel, especially if you had attended in previous years, or were a part of the Python community. This year, there were over 1000 people (double the 500 people that came in 2007, apparently). I spent a sizable portion of the conference days feeling like “I miss a year, and you guys go and get 1000 people”. It’s a great thing that so many people are interested in Python.

PyCon 2008: Day 1

The talks

I went to a reasonable number of talks - talk quality at PyCon has historically been pretty good, and I was a little out of date on the latest on things like Django and Turbogears. The best talk that I went to was Raymond Hettinger’s talk “Core Python Containers — Under the Hood”. This was a great talk for several reasons: Raymond was a good and entertaining speaker. There was significant technical meat - explanations of the implementation choices for all the core containers in Python, lists, sets, and dicts. We heard about doubling factors and amortized big-oh time. Most importantly, there was significant practical applications for Python programmers. Raymond’s talk gives a cost model for the core containers, and having an understanding of that model is important for folks who are writing Python programers. It’s also useful for developers of alternate Python implementations because it allows them to follow suit or to diverge and (hopefully) document the places where the cost model is different. My next favorite talk was Jim Baker’s “More Iterators in Action”. I missed the talk given last year, but I liked this one. Jim hit two of my favorite topics, language integrated query (LINQ) (albeit without the DSL), and concurrency.

Concurrency

There was a lot of interest in concurrency this year, which warms my heart, because I see high-level/dynamic languages and concurrency as the chocolate and peanut butter in the old Reese’s peanut butter cup commercials. There were 2 open space sessions and 1 lightning talk, and the topic entered many of the conversations that I had.

Sun, Jython, and JRuby

People were generally positive to learn about Sun’s interest in Python and Jython. A number of people stopped me to congratulate me on the new job, and we had a nice turnout at the open space session, where people were free with ideas, comments, and a few not so easy to answer questions. I hope that Sun can live up to the goodwill that people extended towards me and Frank.

If I was surprised about the jump in size of PyCon, I was even more surprised by the amount of energy around Jython. At most of the previous PyCon’s that I attended, people would mention Jython, and either be sorry that it was too out of date to consider, or be just plain dismissive of it. This year there was none of that. People were very interested in Jython. I was really surprised by how much interest there was, and by some of the people who were interested. It was certainly a nice feeling to sit in the sprint room and occasionally have people pop in to ask if such and such was running in Jython yet, or did Jython support X because package Y needed it.

This was the first time that I had met Frank Wierzbicki in person — I think he’s the happiest person at Sun right now. I was also able to spend some time hanging out with various folks from the Jython community. It seemed to me that the community was doing quite nicely. If you looked at some of the community metrics that we would use at the ASF to allow a project to graduate from incubation, almost all of those criteria have already been fulfilled. One of my goals for Sun’s Python efforts is for as many of them as possible to be highly community oriented, so it was nice to see that Jython is well on it’s way in that regard. The folks working on Jython are very sharp (including the aforementioned Jim Baker, who it turns out was a classmate of mine at the Brown CS dept - although neither of us can remember meeting the other), and have one of the those (in my mind) essential community ingredients, a community sense of humor.

PyCon 2008: Day 3

Jim snuck this bit of commentary on Jython’s lack of a global interpreter lock into his talk.

There were several Ruby related surprises at PyCon this year. David Heinemeier Hansson, create of Ruby on Rails, made an appearance for one day, and a number of the JRuby committers made a road trip down from Minnesota, to hang out, meet the Jython folks, and generally display their hacker prowess. Which they totally did. Charlie and Tom powered their way to JRuby 1.1RC3 during the conference. Meanwhile Nick Sieger demonstrated what a happens when you stick a bunch of hackers, an EVDO card, and an EVDO hub into a car. The Jython guys (if any of them lived in the same state) need to get some of that - The best thing since the Adobe AIR Bus Tour, and at a fraction of the cost.. The JRuby folks and Jython folks are already starting to talk and share experiences, and I am sure that this will only result in even better dynamic language stuff for the JVM.

Other Cool stuff

On one of the sprint days, I did a bit of wandering and stopped to talk to my friend Brian Dorsey, who is doing some cool stuff here in Seattle. Brian was working with Richard Jones on pyglet and Bruce. Pyglet is a set of Python libraries for writing games and doing other kinds of multimedia. There’s pygame, which I am aware of because of Armin Rigo’s infamous use of pygame to deliver talks about PyPy. Richard has created Bruce, a presentation tool based on pyglet. In addition to being able to do cool multimedia presentation effects, there are some really cool things that you can do. Perhaps the coolest is that you can have a slide which is essentially an embeddd Python interpreter, so no more switching out of your presentation to demo your Python code at work. Really slick.

On a different note, on several evenings, conference goers who stuck around to hang out in the hotel’s common area were treated to musical performances by a dynamic (as in constantly changing set of members) band of Pythonistas:

PyCon 2008: Day 2

PyCon 2008: Day 2

The Sprints

Perhaps the most amazing way in which the conference has changed is this picture.

PyCon 2008: Sprints

That is a picture of a part of the lunch crowd on the first day of the sprints following the conference. When I talked to David Goodger about it, he said that he had taken a count and there were over 250 people at the sprints. Visually, that sprint lunch room looked to be about the size of the room for the first PyCon that I attended (PyCon 2004). Simply amazing. The ASF has a hackathon before every ApacheCon, but I can’t remember one ever reaching this kind of size or scale. Another thing about the PyCon sprints is that they are aimed at growing the community — you don’t have to be a committer on any project in order to attend, and experienced project members will take time to sit and help new people get started. There were several people like that in the Jython sprint room. I was more impressed by what happened with the sprints than any other part of the conference. The only central organization here was that the conference planners obtained sprint space, and in a few cases got some sponsors to cough up money for lunch. Everything else was organized by the projects themselves (I heard that the Django folks closed 100 bugs in a single day). If you want to get a sense of what kinds of things got accomplished at the sprints, you can look at this page on the wiki — it’s not exhaustive, but it’s a start.

Travel

In the past, I’ve had some travel nightmares getting home from PyCon. This year I am happy to report that I didn’t have any problems at all, except for a fight that I had with the Sun internal travel system (and lost).

Conclusion

It was great to be back at PyCon. Interest in Python is growing (as measured by attendance), as is interest in Jython, and interested people are also rolling up their sleeves and pitching in (as measured by sprint attendance growth).

If you think my job is all about Jython, you are confused.

Apparently people are confused about what I am working on at Sun, and with PyCon starting tomorrow, this is not a good thing. I am not going to be working on Jython directly, although I will certainly be poking my nose in to see what’s going on. The Python related part of my job (which will be the majority in the short term) is to figure out what Sun should be doing in the Python space, across all of the relevant platforms at Sun, including but not limited to: the JVM and JEE, Netbeans, and Solaris.

Sun isn’t done in the dynamic language space, and I will also be looking for opportunities with other dynamic languages and related technologies.

Lazyweb: Virtualization software

I am looking at building a bunch of virtualized machines, and I have no idea what software I should be using.

  1. I want to create and run images on Linux and OS X
  2. I want those images to be runnable on Linux, OS X, and Solaris (optional)
  3. I want to make images that run Linux, Windows, and Solaris
  4. I want to run one set of images on a box connected to the internet, and have those images appear as separate machines.
  5. Bonus round: I want there to be some ISP/hosting provider that I can send images to and have them hosted.

I am assuming that my choice are: VMWare, Parallels, and VirtualBox.

Useful pointers and advice appreciated.

Moving from Movable Type 3.2 to WordPress 2.3

Julie has decided that she wants to blog a little more frequently. One obstacle to this is that her blog is a spam magnet - or it was until I deleted the Movable Type comment CGIs as a last ditch defense. I’ve had very good results with Akismet on WordPress, so inevitably this meant an upgrade from Movable Type to Wordpress. It was only a matter of time before this happened, especially now that I am on WordPress myself. Importing is pretty straight forward. It’s making sure that the permalinks don’t break is the problem. In order to do that you need to make sure that Wordpress uses the same numerical id’s that Movable Type used. Which means that you have to hack Movable Type to add an ID field to the exported data, and hack WordPress to use that ID when it creates the new blog entries. Then you do a little mod_rewrite action, and the permalinks should keep on ticking. I’ll now proceed with describing the hacks.

I started with the basic WordPress docs to migrate from Movable Type to WordPress.

Go into the Movable Type 3.2 install and edit lib/MT/ImportExport.pm to add an ID field:

--- ImportExport.pm     2008-02-17 17:55:19.000000000 -0800
+++ ImportExport.pm~    2006-01-04 00:21:09.000000000 -0800
@@ -439,7 +439,6 @@
     my $tmpl = MT::Template->new;
     $tmpl->name('Export Template');
     $tmpl->text(<<'TEXT');
-ID: <$MTEntryID$>
 AUTHOR: <$MTEntryAuthor strip_linefeeds="1"$>
 TITLE: <$MTEntryTitle strip_linefeeds="1"$>
 STATUS: <$MTEntryStatus strip_linefeeds="1"$>

I used Joshua Zader’s modified mt.php and modified it some more so that it would work off of a local file since the MT 3.2 export was over 7MB.

--- /tmp/mt.php 2006-03-27 17:15:00.000000000 -0800
+++ mt.php      2008-02-17 20:51:53.000000000 -0800
@@ -1,7 +1,9 @@
-f<?
+<?

 // HERE I'M DEFINING A CUSTOM FUNCTION BASED ON THE ORIGINAL WP_INSERT_POST FUNCTION LOCATED IN IN FUNCTIONS-POST.PHP

+set_time_limit(0);
+
 function wp_insert_post_with_id($postarr = array()) {
        global $wpdb, $wp_rewrite, $allowedtags, $user_ID;

@@ -287,7 +289,7 @@

        function get_entries() {
                set_magic_quotes_runtime(0);
-               $importdata = file($this->file); // Read the file into an array
+               $importdata = file('/usr/share/wordpress/wp-content/mt-export.txt'); // Read the file into an array
                $importdata = implode('', $importdata); // squish it
                $importdata = preg_replace("/(\r\n|\n|\r)/", "\n", $importdata);
                $importdata = preg_replace("/\n--------\n/", "--MT-ENTRY--\n", $importdata);
@@ -371,7 +373,8 @@
        }

        function select_authors() {
-               $file = wp_import_handle_upload();
+/*             $file = wp_import_handle_upload(); */
+                $file['file'] = ‘/usr/share/wordpress/wp-content/mt-export.txt’;
                if ( isset($file['error']) ) {
                        echo $file['error'];
                        return;

I hope this will save some poor person the time I wasted piecing all this together. Most of the documentation that I googled up was for earlier versions of either Movable Type or WordPress or both. Really, some right thinking, PHP handy person ought to just go in and fix the Movable Type importer and document how to use the Movable Type template in order to make this all work.

Oh, and if you’re going to run multiple WordPress blogs with different url prefixes on a Debian supplied WordPress, then you are going to need this.

At long last, a glimpse of Arc

[via Viva La Chipperfish - via Planet Python]

Paul Graham has released an in-progress version of his Arc dialect of Lisp. In tarball form no less. Where’s the Mercurial repository?

The Erlang community

Matt Croydon Didn’t agree with my commentary on the Erlang community, and he’s partially right. I shouldn’t have said “we need a community” because there is an Erlang community, and I knew that. I have never been a fan of Java, and I don’t want to be stuck using the moral equivalent of Java when the multicore/concurrency thing shakes out. So if I want to be able to use Erlang (and I’ve not totally made that decision), then it needs to have a bigger, more diverse, and easier to find community.

Scalability != concurrency

Sam Ruby is writing about Russell Beattie writing about Java and Erlang.

Russell thinks Java needs an overhaul. I think that Java has reached the point where technical, community, and business forces well exert pressure on the language to evolve in a uniformly bad manner.

Russell wrote:

The reason people are looking at Erlang is not because its beautiful syntax, great documentation, or up-to-date libraries. Trust me. It’s because the Erlang VM can run for long periods of time, scaling linearly across cores or processors filling the same niche that Java does right now on the server.

Actually, I am looking at Erlang as a solution for anywhere, (including the client) where concurrency will be an issue. By the way, it is not VM’s that scale linearly, but computational problems. And there are some problems which just can’t scale linearly, no matter what VM we put them on.

Sam goes on to make the point which is the title of this post.

Next, to dispel a few myths. Slashdot is written in Perl, seems to handle the load, and also seems to stay up. While there are a number of BitTorrent implementations, the original and (to the best of my knowledge) the most pervasive version is written in Python. Yahoo is a mix, but a good portion of it is written in PHP, with critical functions written in C. Twitter is written in Ruby, had early scalability issues, but seems to be past them. These are all examples of massively scalable applications.

Scalability is not the same thing as concurrency. It is certainly possible to scale a program written in any language - that’s a given. Especially when scaling = throwing more hardware at it. But there’s got to be a better way of doing it. Question is whether the better way is worth the price of admission.

But as far as Erlang vs Java, the real kicker is here:

Unlike the CLR which was designed to be multi-language, and unlike the JVM which is in the process of being repurposed to be multi-language also, Erlang’s VM is designed from the ground up assuming that objects typically are immutable and serializable.

Which is what makes the situation with Java so bad. Not only is the language bad, the VM is fatally flawed when it comes to actor style concurrency (which is why for all its niceties, Scala will suffer the same problems as Java). There’s a real problem here — ask yourself why there is a market for these things, if all that is needed is to throw even more boxes at the problem.

In the comments, Sam wrote;

The biggest problem I have with Erlang is clearly an addressable one: the documentation of the libraries, and the lack of good samples that can be quickly found by Google/MSN-Live/Yahoo!/Ask searches. And many of the libraries appear to be abandoned at 0.n versions.

This is actually 2 problems. There’s the issue with the libraries, and there’s the issue with the community that did/didn’t produce the libraries. We don’t just need a technology, we need a community. Hmm, Erlang lab, anyone?

Some simple thoughts on Erlang

Our reading group on Bainbridge Island has been working its way through Programming Erlang. Actually we’re technically not done yet, but since I spent a fair amount of time on the ferry recently, I went ahead and finished it off. There’s been quite a bit of writing about Erlang recently, and I wanted to at least have finished the book before jumping in. Looking at Joe Armstrong’s PhD thesis is probably soonish on my list too.

Basics
Erlang is a functional language which incorporates a concurrency model based on very lightweight processes communicating via messages. I’ll cover the concurrency model a bit more below. Since many people have not really been exposed to functional programming, there are things in Erlang which seem odd when compared to more mainstream languages. In addition, Erlang relies heavily on pattern matching as a flow of control construct, and it takes some time to get used to it. Some people liken the pattern matching aspects of Erlang to Prolog, but this is not entirely accurate because Prolog uses unification, which works in “both directions” and not pattern matching, which only works in “one direction”. I can’t say that I care for the syntax of Erlang, but after using Python, there are very few syntaxes that I really like. Erlang supports higher order functions, so closure based control flow structures are included. There is a fairly usual set of basic data types which are provided. Probably the biggest problem with the basics of Erlang is the way that strings are handled. In reality there are no strings in Erlang, and strings are just lists of integers. More on that below.

Concurrency
Much of the current interest in Erlang is due to its concurrent programming capabilities. The foundations of these capabilities are the availability of processes at the language level. Erlang allows a programmer to create and destroy processes quickly and cheaply (in terms of resources). Processes can only communicate with each other by sending each other unidirectional messages. Every process has a mailbox, which is where messages for it are delivered. The messages are queued there until the process explicitly “receives” them.
The code that implements a process typically consists of a tail recursive loop which explicitly “receives” messages and uses pattern matching to examine the messages and dispatch to the correct behavior. Replying to the sender of a message must be handled by the programmer, but it is easy to code up simple rpc style message passing. Two (or more) processes can be linked to each other so that when one process dies, the other is sent a signal. The preferred mode of handling errors in processes is to kill them and restart them. This signaling forms the basis of the supervision tree concept in OTP. The basic concurrency model of Erlang is a version of the Actor model developed by Carl Hewitt at MIT. I took Hewitt’s class while I was an undergraduate, so the concepts were familiar to me. Erlang is relatively blind to where a process might be running - in the same VM, in a different hardware thread on the same VM, or on a VM on different computer altogether. This makes it easy to write programs that can grow easily when you want to add hardware, whether that is processors or computers.

OTP/Mnesia/ETS/DETS
The folks at Ericsson have also provided a bunch of libraries to raise the level of abstraction for concurrent programming in Erlang. There are 3 major libraries. OTP (Open Telecom Platform) helps a programmer to write scalable, fault-tolerant code. It takes advantage of Erlang’s hot code update facilities to allow processes to be upgraded in place. The basic abstractions to do this are very simple to work with. OTP includes the notion of supervision trees, which is an abstraction for managing networks of processes.

Mnesia is a (potentially) distributed database written in Erlang. It provides an easy mechanism for storing Erlang terms. While it is not an RDBMS, it does provide a query mechanism based on list comprehensions. It also supports transactional behavior and has the ability to duplicate Erlang tables on other machines

Runtime
One thing that isn’t discussed enough are the features of the Erlang runtime/VM. The runtime is very efficient at managing processes, much more so that languages like Python, Ruby, or Java. Erlang programs have been deployed in telephone switching products for years, with extremely long uptimes - due in part to Erlang’s hot code swapping capabilities. Java’s hot code replacement or Python’s reload are substantially weaker than Erlang’s hot code swapping. So while libraries that provide an actor like model can help people learn a good programming model for concurrency, it’s less clear to me that the languages (and the implementations of those languages) hosting the libraries will be as good as Erlang when it comes to highly concurrent applications. Of course, if an application isn’t that concurrent it might not matter.

Conclusion
Semantically, there is a lot to like about Erlang - the actor based concurrency model, hot code swapping, higher order functions, and (once one gets used to it) pattern matching. The OTP libraries have been refined by many years of production usage in demanding, commercial applications.

At the same time, there are number of issues which I think are real barriers to Erlang adoption. The syntax will prove difficult for many people, which is a big issue. I’ve already mentioned the problems with string handling, and really that generalizes to a lack of libraries for performing 21st century / web computing tasks. The nice thing about telephone switches is that you don’t really have to talk to the world. But if Erlang is to be viable as a solution for mainstream programming as it moves to more concurrency, Erlang programs must be able talk to the environments around them.

I am aware of several projects where Erlang is being used to do the heavy server lifting and then data is being passed off to programs written in more familiar languages like Python, Ruby, or Java. Certainly this is one way that people could begin to exploit the benefits of Erlang without converting wholesale. It would also give the Erlang community some time to improve Erlang to the point where it could be adopted by a larger audience.