Tag Archives: open source

OSCON 2011

I’m sitting on an Amtrak train at the end of July, which can only mean that OSCON has just finished up.

This year OSCON had something new, a pair of extra conferences, one on Data and one on Java, that overlapped the usual OSCON tutorial days. This year, I’m going to break the talk coverage down by conference.

OSCON Data

For purely technical content, OSCON Data was the winner for me. There were two talks which stuck out.   

The first was Tom Wilkie’s talk on Castle. I first became aware of Castle at South by Southwest, when a persistent Manu Marchal found me on Twitter and arranged a meeting to explain the technology that Acunu was building to accelerate Cassandra and similar types of storage engines. Earlier this year the Acunu team published some papers describing their work. Those papers are still on the desk in my office, so I figured I could get the overview of the paper content by attending the talk. It’s a pleasure to see a talk focused on fundamental technology work on data structures and algorithms, and Tom’s talk delivered that. Castle is the open source version of their write optimized in kernel storage system. I’m looking foward to hearing field reports.

The other stand out talk was Josh Patterson’s talk Lumberyard: Time series indexing at scale, Time series data is growing in importance, and it’s great to see people working on this problem. Lumberyard uses the iSAX work that was done at UC Riverside. In addition to the time series functionality, Josh demonstrated how a number of seeming unrelated problems like image recognition, could be converted into time series problems which could then be solved with something like Lumberyard. I definitely learned something new.

OSCON Java

I only attended one talk at OSCON Java, and then only briefly. The talks was a keynote on JavaFX. JavaFX was originally positioned as a competitor to Flash/Flex, and then expanded to act as a new GUI framework for desktop Java. In the meantime, the world has moved. The iPad is casting doubts on the value proposition of Flash, which is an established and broadly adopted technology. Microsoft is backing away from Silverlight, and focusing on HTML5. Even if Oracle wins its lawsuit with Google it’s hard to see how JavaFX would be relevant for the device world.   It’s hard to see how there is any room for JavaFX.

OSCON

Ariel Waldman was given a keynote slot to talk about Hacking Space Exploration. Space exploration is a topic that resonates with many people that come to OSCON, and given the recent end of the U.S. Space Shuttle program, it was encouraging to hear about all the avenues which are available to those who are interested in space exploration. My middle daughter wants to go to Mars someday, and I’m definitely going to be showing her the video of this talk.

I ended up in Gabe Zicherman’s talk on Gamification by accident. Late last year we sent someone from our group to a gamification workshop and Zicherman was among the speakers. I figured that via summarized analysis I had heard all that I needed to know. Apparently not. The material that was presented was thought provoking, and Zicherman is an effective and entertaining speaker.   

Personal information is in my blood, from my time working on Chandler and well before that.   When I was at Strata last year, I briefly heard about the Locker project, but I didn’t really get a good sense of what was going on there. Jeremie Miller partially talked about and partially demo’ed the Locker project, and there was also a hackathon during one of the BOF slots. It is very early days still, but if you are interested in getting involved, bootstrapping instructions are here.

Sometimes you find emerging technology in the strangest places. I went to Awakening The Maker Ethic in K-12 Students because it was the only talk in that slot that seemed interesting, because I have kids, and because we’ve pursued an unusual path for their education. In addition to amassing a long reading list on education and technology and education, I learned something interesting about open hardware, Arduino in particular. Arduino has been around for a while, but I’ve really only heard it mentioned in the context of communities that would be the logical extensions of the model rocketeers, the hams, and the heathkit crowd. During this talk I learned that Arduino has crossed over into communities that are focused more on craft and art. Arduino makes it possible for these people to make smart craft items. Some examples of such items are this “reverse geocache” wedding gift box, and a bicycling jacket with turn signal lights embedded in it.

Hallway

I try not to comment on the hallway track, but there’s one conversation that i had which I think merits an mention. Earlier in the year, some mutual friends introduced me to David Eaves, who has been consulting with Mozilla to develop some metrics to help Mozilla improve the way that it manages its community. In April, David previewed some of his work in a blog post. At the ASF various individuals have done small experiments around community metrics, but as far as I can tell David’s work is defining the state of the art in this area. I would love to see the work that he has been doing duplicated in JIRA and Github. This is the kind of work that should be a talk at OSCON, and it’s a shame that we’ll have to wait until next year to hear that talk. In the meantime, read the post, get on the mailing list, and help bring open source community tools into the 21st century.

Meta

I’ve been coming to OSCON since 2003 (I did miss one year), and I always look forward to it. This year OSCON was tough for me. I had a very difficult time finding sessions to go to, especially in OSCON proper. Of the four sessions I found blog worthy, two were about open source, and I only ended up in two of them by accident – I almost went to nothing during those sessions. Ordinarily, that wouldn’t be a problem, because there’s always the hallway track. This year a lot of people that I normally expect to see at OSCON were not there. I did have a good hallway track, but not nearly as rich as normal. As an example, some of us went to celebrate Duncan Davidson’s birthday on Thursday night. In the past, there have been enough people to take over the Vault in Portland. This year, we only needed a single table. For me, that’s a serious impact because of the way that I approach the technology field. When I was a graduate student, my advisor lamented about a bygone era of DARPA funding in which you would go to DARPA and say, “I’m going to do X”. Sometimes you came back with X, but sometimes you came back with Y, but as along as Y was interesting there wouldn’t be a problem. During my era, DARPA apparently got much more serious about you doing what you said you would do. In that bygone era, DARPA funded people. In my era, they funded topics. Topics are important to me, but given the choice between topics and people, I will pick the people every time. I always tell people that the value of OSCON is that it’s the one place where you can get any substantial number of the open source communities together under one roof. I hope that keeps happening.

NodeConf 2011

Although I was definitely interested in JSConf (writeup), Nodeconf was the part of the week that I was really looking forward to. I’ve written a few small prototypes using Node and some networking / web swiss army knife code, so I was really curious to see what people are doing with Node, whether they were running into the same issues that I was, and overall just get a sense of the community.

Talks

Ryan Dahl’s keynote covered the plans for the next version of Node. The next release is focused on Windows, and the majority of the time was spent on the details of how one might implement Node on Windows. Since I’m not a Windows user, that means an entire release with nothing for me (besides bug fixes). At the same time, Ryan acknowledged the need for some kind of multiple Node on a single machine facility, which would appear in a subsequent. I can see the wisdom of making sure that the Windows implementation works well before tackling clustering or whatever it ends up being called. This is the third time I’ve heard Ryan speak, and this week is the first time I’ve spent any time talking with him directly. Despite all the hype swirling around Node, Ryan is quiet, humble, and focused on making a really good piece of software.

Guillermo Rauch talked about Socket.io, giving an overview of features and talking about what is coming next. Realtime apps and devices are a big part of my interest in Node, and Socket.io is providing an important piece of functionality towards that goal.

Henrik Joreteg’s talk was about Building Realtime Single Page applications, again in the sweet spot of my interest in Node. Henrik has built a framework called Capsule which combines Socket.io and Backbone.js to do real time synchronization of model states between the client and server. I’m not sure I believe the scalability story as far as the single root model, but there’s definitely some interesting stuff in there.

Brendan Eich talked about Mozilla’s SpiderNode project, where they’ve taken Mozilla’s SpiderMonkey Javascript Engine and implemented V8’s API around it as a veneer (V8Monkey) and then plugged that into Node. There are lots of reasons why this might be interesting. Brendan listed some of the reasons in his post. For me, it means a chance to see how some proposed JS.Next features might ease some of the pain of writing large programs in a completely callback oriented style. The generator examples Brendan showed are interesting, and I’d be interested in seeing some larger examples. Pythonistas will rightly claim that the combination of generators and callbacks is a been there / done that idea, but I am happy to see some recognition that callbacks cause pain. There are some other benefits of SpiderMonkey in Node such as access to a new debugging API that is in the works, and (at the moment) the ability to switch runtimes between V8 and SpiderMonkey via a command line switch. I would be fine if Mozilla decided to really take a run at making a “production quality” SpiderNode. Things are still early during this cycle of server side JavaScript, and I think we should be encouraging experimentation rather than consolidation.

One of the things that I’ve enjoyed the most during my brief time with Node is npm, the package management system. npm went 1.0 shortly before NodeConf, so Isaac Schleuter, the primary author of npm, described the changes. When I started using Node I knew that big changes were in the works for npm, so I was using a mix of npm managed packages and linking stuff into the Node search path directly. Now I’m using npm. When I work in Python I’m always using a virtualenv and pip, but I don’t like the fact that those two systems are loosely coupled. I find that npm is doing exactly what I want and I’m both happy and impressed.

I’ve been using Matt Ranney’s node_redis in several of my projects, it has been a good piece of code, so I was interested to hear what he had to say about debugging large node clusters. Most of what he described was pretty standard stuff for working in clustered environments. He did present a trick for using the REPL on a remote system to aid in debugging, but this is a trick that other dynamic language communities have been doing for some time.

Felix Geisendorfer’s talk was titled “How to test Asynchronous Code”. Unfortunately his main points were 1. No I/O (which takes out the asynchrony 2. TDD and 3. Discipline. He admitted in his talk that he was really advocating unit testing and mocking. While this is good and useful, it’s not really serious testing against the asynchronous aspects of the code, and I don’t really know of any way to do good testing of the non-determinism introduced by asynchrony. Felix released several pieces of code, including a test framework, a test runner, and some faking/mocking code.

Charlie Robbins from Nodejitsu talked about Node.js in production, and described some techniques that Nodejitsu uses to manage their hosted Node environment. Many of these techniques are embodied in Haibu, which is the system that Nodejitsu uses to manage their installation. Charlie pushed the button to publish the github repository for Haibu at the end of his talk.

Issues with Node

The last talk of the day was a panel of various Node committers and relevant folks from the broader Node community depending on the question. There were two of the audience questions that I wanted to cover.

The first was what kind of applications is Node.js not good for. The consensus of the panel was you wouldn’t want to use Node for applications involving lots of numeric computation, especially decimal or floating point, and that longer running computations were a bad fit as well. Several people also said that databases (as in implementing a database) were a problem space that Node would be bad at. Despite the hype surrounding Node on Twitter and in the blogosphere, I think that the core members of the Node community are pretty realistic about what Node is good for an where it could be usefully applied.

The second issue had to do with Joyent’s publication of a trademark policy for Node. One of the big Node events in the last year was Joyent’s hiring of Ryan Dahl, and subsequently a few other Node contributors. Joyent is basing its Platform as a Service offering on Node, and is mixing its Node committers with some top notch systems people who used to be at Sun, including some of the founding members of the DTrace team. Joyent has also taken over “ownership” of the Node.js codebase from Ryan Dahl, and that, in combination with the trademark policy is causing concern in the broader Node community.

All things being equal, I would prefer to see Node.js in the hands of a foundation. At the same time, I understand Joyent’s desire to try and make money from Node. I know a number of people at Joyent personally, and I have no reason to suspect their motives. However, with the backdrop of Oracle’s acquisition of Sun, and the way that Oracle is handling Sun’s open source projects, I think that it’s perfectly reasonable to have questions about Joyent or any other company “owning” an open source project. Let’s look at the ways that an open source project is controlled. There’s 1) licensing 2) intellectual property/patents 3) trademarks 4) governance. Now, taking them one at a time:

  1. Licensing – Node.JS is licensed under the MIT license. There are no viral/reciprocal terms to prevent forking (or taking a fork private). Unfortunately, there are no patent provisions in the MIT license. This applies to #2 below. The MIT license is one of the most liberal licenses around – it’s hard to see anything nefarious in its selection, and forking as a nuclear option in the case of bad behavior by Joyent or an acquirer is not a problem. This is the same whether Node is at a foundation or at Joyent.
  2. Intellectual Property – Code which is contributed to Node is governed by the Node Contributor License Agreement, which appears to be partially derived from the Apache Individual and Corporate Contributor license agreements (Joyent’s provision of an on-line form is something that I wish the ASF would adopt – we are living in the 21st century after all). Contributed IP is licensed to Node, but the copyright is not assigned as in the case of the FSF. Since all contributors retain their rights to their contributions, the IP should be clean. The only hitch would be if Joyent’s contributions were not licensed back on these terms as well, but given the use of the MIT license for the entire codebase, I don’ think that’s the case. As far as I can tell, there isn’t much difference between having Node at a foundation or having it at Joyent.
  3. Trademark – Trademark law is misunderstood by lots of people, and the decision to obtain a trademark can be a controversial one for an open source project. Whether or not Node.js should have been trademarked is a separate discussion. Given that there will be a trademark for Node.js, what is the difference between having Node at a foundation or at Joyent? Trademark law says that you have to defend your trademark or risk losing it. That applies to foundations as well as for profit companies. The ASF has sent cease and desist letters to companies which are misusing Apache trademarks. The requirement to defend the mark does not change between a non-profit and a for-profit. Joyent’s policy is actually more liberal than the ASF trademark policy. The only difference between a foundation and a company would be the decision to provide a license for use of the trademark as opposed to disallowing a use altogether. If a company or other organization is misusing the Node.js trademark, they will have to either obtain a license or stop using the mark. That’s the same regardless of who owns the mark. What may be different is whether or not a license is granted or usage is forbidden. In the event of acquisition by a company unfriendly to the community, the community would lose the trademarks – see the Hudson/Jenkins situation to see what that scenario looks like.   
  4. Governance – Node.js is run on a “benevolent dictator for life” model of governance. Python and Perl are examples of community/foundation based open source projects which have this model of governance. The risk here is that Ryan Dahl is an employee of Joyent, and could be instructed to do things a certain way, which I consider unlikely. I suppose that a foundation you could try to write additional policy about removal of the dictator in catastrophic scenarios, but I’m not aware of any projects that have such a policy. The threat of forking is the other balance to a dictator gone rogue, and aside from the loss of the trademark, there are no substantial roadblocks to a fork if one became necessary.

To riff on the 2010 Web 2.0 Summit, these are the four “points of control” for open source projects. As I said, my first choice would have been a foundation, and for now I can live with the situation as it is, but I am also not a startup trying to use the Node name to help gain visibility.

Final thoughts

On the whole, I was really pleased with Nodeconf. I did pick up some useful information, but more importantly I got some sense of the community / ecosystem, which is really important. While the core engine of Node.js is important, it’s the growth and flourishing of the community and ecosystem that matter the most. As with most things Node, we are still in the early days but thing seem promising.

The best collections of JSConf/NodeConf slides seem to be in gists rather than Lanyrd, so here’s a link to the most up to date one that I could find.

Update: corrected misspelling of Henrik Joreteg’s name. And incorrectly calling Matt Ranney Mark.

CouchCamp 2010

I spent a few days last week at CouchCamp, the first mass in-person gathering of the community around CouchDB. There were around 80 people from all over the world, which is pretty good turnout. The conference was largely in unconference format although there were some invited speakers, including myself.

I think it says a lot about the CouchDB community that they invited both Josh Berkus and Selena Deckelmann from Postgres to be speakers. The “NoSQL” space has become quite combative recently, so it is great to see that the CouchDB has connections to the Postgres community, and respect for the history and lessons that the Postgres folks have learned over the year. Josh’s talk on not reinventing the wheel was well received, and his discussion of Joins vs Mapreduce took me back to my days as a graduate student in databases. His talk made a great lead in for Selena’s talk on the nitty gritty details of MultiVersion Concurrency Control

There were lots of good discussions on issues related to security and CouchApps, but the discussion that got my attention the most was Max Ogden’s discussion on the work that he is doing to open up access to government data, particularly around the use of location information. He’s been using GeoCouch as the platform for this work. In the past I’ve written about the importance of a good platform for location apps, particularly in the context of GeoDjango. GeoCouch looks to be a very nice platform for location based applications. This is a very nice plus for the CouchDB community.

These days, it’s impossible to be at a conference that involves Javascript and not hear some buzz about Node.js. As expected, there was quite a bit of it, but it was interesting to talk to people about what they are doing with Node. Everything that I heard reinforces my gut feel that Node.js is going to be important.

I was one of the mentors for the CouchDB project when it came to the Apache Software Foundation, and I was asked to speak about community. The CouchDB community has accomplished a lot in the last few years, and is doing really well. I prepared a slide deck, but didn’t project it because my talk was the last talk of the conference, and we wanted to do it in the outside amphitheater. I also wanted to tune some of the sections of the talk to include things that I observed or was asked about during the conference. The biggest reason that I prepared slides was to show excerpts of Noah Slater’s CouchDB 1.0 retrospective e-mail. A lot of what I think about community is summarized well in Noah’s message, and the note summarizes the state of the community better than I could have done it myself. I hope that we’ll be hearing more testimonials like Noah’s in the years to come.

OSCON 2010

It’s nearing the end of July, which means that OSCON has come and gone. Here are my observations on this year’s event.

Talks

As always, there are a huge number of talks at OSCON, and this year I found it particularly hard to choose between sessions, though in several cases, hallway track conversations ended up making those choices for me. There wasn’t a theme to my talk attending this year, because a lot of topics are relevant to the work that I am doing now.

I attended a talk about Face Recognition on the iPhone. Sadly this turned out to be very focused on setting up for and calling the OpenCV library and less about face recognition, UI, or integrating face recognition into applications. As I’ve written previously, I think that new interface modalities may be arriving along with new devices, so I was hoping for a bit more than I got.

Big Data is also a topic of interest for me, so I went to Hadoop, Pig, and Twitter, and Mahout: Mammoth Scale Machine Learning. Hadoop, Pig, and Mahout are all projects at Apache, and each of them have an important part to play in the emerging Big Data story. The sort of analytics that people will be using these technologies for are part of the reason that data is now the big area of concern when discussing lock in.

The open source guy in me likes the idea of WebM, but it looks to me like there’s quite a way to go before it will be replacing H.264. I was surprised that the speaker didn’t have a better answer than “our lawyers must have checked this when we acquired On2”. More than anything else, getting clarity on the patent provenance for VP8 is what would make me feel good about WebM.

Robert Lefkowitz (the r0ml), is always an entertaining and thought provoking speaker. His OSCON presentations are not to be missed. This year he gave two talks, and you can read some of my commentary in my twitter stream. Unfortunately, r0ml picked licensing as the topic of his second presentation, and his talk was interrupted by an ill-tempered and miffed free software enthusiast, thus proving r0ml’s earlier solution that open source conferences are really legal conferences.

I’ve been following / predicting the server side javascript space for several years now. One of the issues with that space is the whole event based programming model, which caused mortal Python programmers headaches when dealing with the Twisted Python framework. Erik Meijer’s group at Microsoft has been grabbing techniques from functional program to try to make the programming model a bit more sane. I had heard most of the content in his Reactive Extensions For JavaScript talk before, and I’m generally enthusiastic about the technology. The biggest problem that I have is that RxJS is not licensed under an open source license. At JSConf I was told that this is being worked on, so I dropped in for the second half of Erik’s talk hoping to hear an announcement about the licensing. It was OSCON after all, and the perfect place to make such an announcement, but no announcement was made. I hope that Microsoft won’t wait until next year’s OSCON to get this done.

This year there were two of the keynote presentations that made enough of an impression to write about. The first was Rob Pike’s keynote on Go, where he eloquently noted some of the problems with mainstream programming languages. There was no new information here, but I liked the approach that he took in his analysis. The second was Simon Wardley’s Situation Normal, Everything Must Change . Simon is an excellent presenter and full of insight. While his talk was ostensibly about cloud computing, I think it was a little deeper than that. His story about the cloud is the story about commoditization of technologies, and since one of the roles of open source is commoditization of technology, I felt that there was some nice insight there for those of us working in open source. Simon also discussed the mismatch between innovation and mature organizations, another issue that people in open source often run into. The video is already up on YouTube, so you can make your own assessment of the talk.

The size and breadth of OSCON gives it one of the richest hallway tracks of any conference. This year was no exception – the hallway track started on the train from Seattle to Portland, and extended all the way through the return train trip as well. I always look forward to these discussions and to connecting with old friends. One friend that I caught up with was Cliff Schmidt, now executive director of Literacy Bridge, which is working to make knowledge accessible to people in poor rural communities throughout the world via their Talking Book device. Cliff had one with him, and this was the first time I had seen one. Just in case you haven’t, here’s what they look like:

Dailyshoot 249

Languages

OSCON began as a language conference, and this year, there were two special events in that space, the Scala Summit and the Emerging Languages Camp.

I have followed the Scala community pretty closely, because they are quickly accumulating real-world experience with functional programming. There are lots of cool tricks that remind me of stuff that I played with when I was in graduate school, and there are lots of bright people. But there are some things that I find worrying. For example, one of the speakers was touting the fact that Scala’s type system is now Turing complete. If I’m using Scala, one reason is that I want my programs to type check at compile time. Having the type checker go off and fail to halt is not what I had in mind. I recognize that you’d have to write some gnarly type declarations for this to happen, but still.

This was the first year for Emerging Languages Camp, and from what I can tell it was a roaring success. I didn’t attend as many sessions as I would have liked. This was due to a combination of factors – other talks that I wanted to see, being the biggest. The other factor was that the first talk I attended was Rob Pike’s talk on Go, and the room was very full, which made it hard for me to concentrate (probably had more to do with me than the room). When I saw that all the talks were being recorded and that the video folks promised to have them up in 2-4 weeks, it made it seem less urgent to try to pop in and out and fight the crowd. Still, this is a sign of success, and I hope that the minimum, the Emerging Language Camp will be given a larger room next year. Part of me would like to see it be a completely separate event from OSCON, but that’s probably not realistic.

Of the talks that I was able to attend, I found the Caja and BitC talks to be the most relevant. Fixing Javascript to have security is important for both client and the burgeoning server applications of Javascript. I wish that I had seen the talk on Stratified Javascript, since concurrency is ever the hot topic these days. As far as BitC goes, we are well beyond the time when we should have had a safe systems programming language. As much as C has contributed to the world, we really need to move on.

What is OSCON for?

I had a few discussions along the lines of “What is OSCON for?”, and Tim Bray shared some thoughts in his OSCON wrap up. As I have written before, I think that open source has “won”, in the sense that we no longer need to prove that open source software is useful, or that the open source development process is viable. There are still questions about open source business models, but that’s a topic that I’m not as interested in these days. Open source having “won” doesn’t mean that our ideas have permeated the entire world of computing yet, so there is still a need for a venue to discuss these kinds of topics. OSCON is more than that, though. It’s also a place where hackers (in the good sense) have a chance to showcase their work, and to exchange ideas. In that sense, part of OSCON is like a computing focused eTech. Apparently O’Reilly is no longer running eTech, which is fine – the one time that I attended, I was underwhelmed. I think that perhaps what is happening in the Emerging Languages Camp might be an example of how things might move in the future.

Of course, there’s a larger question, which is why do we have conferences at all anymore? Many conferences now produce video content of the sessions. I don’t really think there’s a lot of value in having an event to do product launches or announcements. The big thing is the hallway track, which allows for realtime interchange of ideas and opinions, and in the case of open source, provides a dosage of high bandwidth, high touch interaction that helps keep the communities running smoothly. We’re in the 21st century now. Is there something better that we can do?


Thoughts on Open Source and Platform as a Service

The question

Last week there were some articles, blog posts and tweets about the relationship between Platform as a Service (PaaS) offerings and open source. The initial framing of the conversation was around PaaS and the LAMP (Linux/Apache/MySQL/{PHP/Perl/Python/Ruby}) stack. An article on InfoQ gives the jumping off points to posts by Geva Perry and James Uruqhardt. There’s a lot of discussion which I’m not going to recapitulate, but Uruqhardt’s post ends with the question

I’d love to hear your thoughts on the subject. Has cloud computing reduced the relevance of the LAMP stack, and is this indicative of what cloud computing will do to open-source platform projects in general?

Many PaaS offerings are based on open source software. Heroku is based on Ruby and is now doing a beta of Node.js. Google’s App Engine was originally based on Python, and later on Java (the open sourceness of Java can be debated). Joyent’s Smart Platform is based on Javascript and is open source. Of the major PaaS offerings, only Force.net and Azure are based on proprietary software. I don’t have hard statistics on market share or number of applications, but from where I sit, open source software still looks pretty relevant.

Also I think it’s instructive to look at how cloud computing providers are investing in open source software. Rackspace is a big sponsor of the Drizzle project, and of Cassandra, both directly and indirectly through its investment in Riptano. EngineYard hired key JRuby committers away from Sun. Joyent has hired the lead developer of node.js, and VMWare bought SpringSource and incorporated it into VMForce. That doesn’t sound to me like open source software is less relevant.

Cloud computing is destined to become a commodity

The end game for cloud computing is to attain commodity status. I expect to see markets in the spirit of CloudExchange, but instead of trading in EC2 spot instances, you will trade in the ability to run an application with specific resource requirements. In order for this to happen, there needs to be interoperability. In the limit, that is going to make it hard for PaaS vendors to build substantial platform lockin, because businesses will want the ability to bid out their application execution needs. Besides, as Tim O’Reilly has been pointing out for years, there’s a much more substantial lock in to be had by holding a business’s data as opposed to a platform lock. This is all business model stuff, and the vendors need to work this out prior to large scale adoption of PaaS.

Next Generation Infrastructure Software

The more interesting question for developers has to do with infrastructure software. In my mind LAMP is really a proxy for “infrastructure software” If you’ve been paying any attention at all to the development of web application software, you know that there is a lot happening with various kinds of infrastructure software. Kiril Shenynkman, one of the commenters on Geva Perry’s post wrote:

Yes, yes, yes. PHP is huge. Yes, yes, yes. MySQL has millions of users. But, the “MP” part of LAMP came into being when we were hosting, not cloud computing. There are alternative application service platforms to PHP and alternatives to MySQL (and SQL in general) that are exciting, vibrant, and seem to have the new developer community’s ear. Whether it’s Ruby, Groovy, Scala, or Python as a development language or Mongo, Couch, Cassandra as a persistence layer, there are alternatives. MySQL’s ownership by Oracle is a minus, not a plus. I feel times are changing and companies looking to put their applications in the cloud have MANY attractive alternatives, both as stacks or as turnkey services s.a. Azure and App Engine.

How many of the technologies that Shenykman lists are open source? All of them.   

Look at Twitter and Facebook, companies whose application architecture is very different from traditional web applications. They’ve developed a variety of new pieces of infrastructure. Interestingly enough, many of these technology pieces are now open source (Twitter, Facebook). Open source is being used in two ways in these situations. It is being used as a distribution mechanism, to propagate these new infrastructure pieces throughout the industry. But more importantly (and for those observing more closely, quite imperfectly), open source is being used as a development methodology. The use of open source as a development methodology (also known as commons-based peer production) is definitely contributing to these innovative technologies. Open source projects are driving innovation (this also happened in the Java space. Witness the disasters of EJB 1.0 and 2.0 which lead to the development of EJB 3.0 using open source technologies like Hibernate, and which provided the impetus for the development of Spring). Infrastructure software is a commons, and should be developed as a commons. The cloud platform vendors can (and are) harvesting these innovations into their platforms, and then finding other axes on which to compete. I want this to continue. As I mentioned in my DjangoCon keynote last year, I also want open source projects to spend more time thinking about how to be relevant in a cloud world.   

My question on PaaS is this: Who will build a PaaS that consolidates innovations from the open source community, and will remain flexible enough to continue to integrate those innovations as they continue to happen?

ApacheCon US 2009

[This post is late because I came down with the flu right after I got back from ApacheCon. I guess next year I will get a flu shot]

Talks

This year I was unable to attend all of the conference due to some scheduling problems, so I can’t give an in depth report on talks. I used some of the time that I might normally have spent in talks to catch up with people that I haven’t seen in a while. I was able to attend a good number of the talks in the Hadoop track. The track was larger than last year’s track (due in part to a larger room), but I felt that last year’s track was stronger. It might also be that I’ve become a bit more familiar with Hadoop, making it harder to make a bing impression. It’s definitely the case that there was a lot of interest in Hadoop, and I expect that to continue.

ApacheCon US 2009

Unfortunately, I missed the NoSQL meetup during the Apache BarCamp. I think that there could/should have been an entire NoSQL track, especially given the fact that Cassandra and CouchDB are both frequently mentioned NoSQL technologies, and both are housed at Apache.

One talk that surprised me was Ross Gardler’s talk Teaching and Learning about Open Development, originally I didn’t think that I would have time to stay for that talk slot, but a rearrangement of my return flight loosened my schedule so that I could stick around. Ross is the chairman of the newly created Community Development PMC at Apache. This is a new effort aimed at improving the experience of contributors and new committers. Some of the people on the PMC have been heavily involved in the ASF’s Google Summer of Code outreach, and will be bringing their experiences over with them. It seems like this PMC will also be a good place for people concerned about diversity issues to dig in and help in a concrete fashion.

Celebration

This year’s ApacheCon’s have been a celebration of the 10 year anniversary of the founding of the Apache Software Foundation. At Oakland, there was a cake, a proclamation from the Mayor of Oakland, and (I didn’t get to see this) a letter of congratulations from the Governor of California. Rather than try and describe the festivities in prose, I’ll leave you with some photos:

ApacheCon US 2009
ApacheCon US 2009
ApacheCon US 2009
ApacheCon US 2009
ApacheCon US 2009

The entire set of photos is up on Flickr.

10 Years of Apache

10th_Anniversary_logo_final_w_URL.gif

November is just around the corner, which means that once again it’s time for ApacheCon US. This year is a special year for the Apache Software Foundation – its 10 year anniversary. Since I got involved with Apache just a few months after the foundation was created, it is also my 10 year anniversary of being involved in open source software.

This year I am going to be speaking twice. On Wednesday I’ll be speaking on the Apache Pioneers Panel, and on Thursday I’ll be giving a talk titled How 10 years of Apache has changed my life. I owe a huge professional debt to the ASF and its members and committers, so in my talk I’ll be interweaving important events in the life of the foundation with my own personal experiences and lessons learned.

Unfortunately, I’m not going to be there for all of the conference this year – I’ll be arriving Tuesday afternoon and flying out on Thursday evening. If you want to meet up, I’m in the ApacheCon Crowdvine, and I’ll be around with camera in hand (and on the LumaLoop).

Design and Commons Based Peer Production

On Tuesday, Chris Messina wrote a post about open source and design, where he laments that open source (whatever that means nowadays) and design seem to be opposed to each other. The crux of the problem seems to be that good design requires a unity about it and that since in open source all voices are heard, you inevitably end up with something that has been glommed together rather than designed. This is something that Mimi Yin and I discussed in our 2007 OSCON talk about the challenges of the Chandler design process. Chris is gloomy on the prospects of open source design processes, because he doesn’t feel that there are any examples that have succeeded. I think that this is a legitimate place to be. I don’t really see any successful open source desktop application which was designed in the kind of open design process that Chris or we at OSAF had in mind.

Is organization the problem?

On the other hand, I think that I’m slightly more optimistic about the situation than Chris is. Chris holds up the idea that there ought to be a design dictator, who drives the design and preserves the unity of the design. I’d point out that there are some open source communities where there are such people. Perhaps the best example that I can come up with are the programming languages. A good language is very hard to design. It needs to have the kind of unity that one expects to find in a good design. In some of the language communities, these designers have titles such as “Benevolent Dictator for Life”, and the community as a whole has recognized their giftedness and given them the ability to make final binding decisions about design issues. This isn’t end user facing desktop or web software, but it’s also not bunches of libraries, or implementations of JSR’s, IETF RFC’s, W3C recommendations or POSIX standards. These situations are very delicate mixes and their success is highly dependent on the particular individuals who are involved, so they tend to be rare. Nonetheless, I do think that its possible for communities to work even if there is a chief designer.

I also don’t think that there needs to be a single chief designer. Chris cited Luke Wroblewski’s description of the design process at Facebook. Very early in that post you read:

The Facebook design team works on product design, marketing, UI patterns, branding, and front-end code. The team consists of 15 product designers, 5 user interface engineers, 5 user experience researchers, 4 communication designers, and 1 content strategist. 25 designers in a company of 1,000.

Design can be done by teams. I think that we all know that, but in many of the discussions that I’ve had on this topic, the focus seems to be on the need for a dictator. The dictator model works, but so does a team model.

I think that the organizational challenges of design (dictator vs team) can be dealt with. If you bootstrap a community with a DNA that is committed to good design, and to the value of a good designer, and if the designer has won the respect of the community, then I can see a path forward on that front.   

The problems that I see

In my mind the problems are:

How do you find a designer or designers who want to work in this kind of environment? We know that not all developers are well suited to distributed development. I’d venture that the same is true for designers. It’s much easier for coders to self select into a project than it is for all other types of contributors, including designers.

How can a non-coding designer win the respect of a (likely) predominantly coding oriented community? If you believe that open source projects should be organized around some notion of merit, then what are the merit metrics for designers? Who evaluates the designers on these metrics? Are the evaluators even qualified to do so? In my examples of communities with designers, those designers are all coders as well.

Can we actually do design using the commonly accepted tools of e-mail, version control, wiki’s and bug trackers? The design process relies very heavily on visual communications. The code (including design and architecture of code) process is predominantly a text based process. It is very difficult to do design efficiently in a distributed setting using the existing stable of tools. This is going to be a challenge not just for designers but for many other problem domains that could benefit from commons-based peer production.

What’s with you and that long word?

I prefer to use Yochai Benker’s term “Commons Based Peer Production” instead of the term open source. The problem with the term open source is that everyone means something different when they use it. Some people just mean licensing. Some people think of a particular community’s set of practices. Others think that it means some kind of fuzzy democracy and mob rule.   
One of the reasons that I went to work at OSAF was to see if it was possible to design a good end-user application via some kind of community oriented design process. As far as I am concerned the jury is still out.

OSCON 2009

It’s time again for the annual OSCON report.

OSCON 2009

The conference

Every other OSCON that I’ve been to (since 2003) has been in Portland, and in some ways the two have become synonymous for me. I’m not taking the move to San Jose very well. There are a variety of little things, like the fact that you could end up walking 1/4 of a mile to get from one talk to another only to end up reversing the trip for the next session. At the end of Thursday, I bagged going to a talk because I was tired of walking back and forth. I had a bad experience (much worse than usual) with the WiFi connection in the hotel where I was staying, something that I don’t tolerate very well. The fact that the hotel acknowledged the problem and then offered drink vouchers as an apology didn’t help any. I had to ask the checkout agent to remove the charges for the days that I got under 20kb/s. If you take the view (which I do) that OSCON really starts at 6pm and ends at 3am, then downtown San Jose doesn’t really hold a candle to downtown Portland. My understanding is that OSCON only has a one year contract for San Jose, so maybe we’ll get something else next year. I hope so.

Another thing about OSCON relates to the attendees themselves. I was unsurprised to hear that attendance was down. The combination of the economy and the move away from Portland could explain some of that. The lunch hall seemed pretty full (and the food was very good for a conference lunch – maybe the best I’ve ever had), and it seemed a decent size to me. What I noticed was something else. Normally when I show up at OSCON, even on the first day of tutorials, it is pretty hard to go very far before I run into someone that I know. This year that was not the case, and I don’t feel that it improved that much once the conference proper began. In combination with the move to San Jose, this had a pretty major impact on the value that I got out of the conference.

The talks

OSCON 2009

This year I wound up all over the map session wise. I took in some sessions on tools: The SD distributed bug tracker, and Theo Schlossnagle’s talk on his new monitoring system, Reconnoiter. I also attended Tom Preston-Warner’s talk on github. His talk ended up being much more about git in general. I was hoping that he would have more to say on the social/community behaviors that they’ve observed on projects on GitHub. There’s not a lot of data on how the use of DVCS’s is impacting the social/community dynamics of open source projects, and the folks at github are in a unique position to observe some of this. Maybe next year.

I also continued to gather more information on things related to cloud computing. In this case there was some storage stuff in the form of Neo4J and Cassandra. Adam Jacob’s s talk on Chef was well attended despite being in the last session block of the conference, and people stayed well past the ending time for the Q&A. Reconnoiter also falls into the cloud tools space. I attended Kirrily Robert, Yoz Grahame, and Jason Douglas’ talk titled “Forking Encouraged: Folk Programming, Open Source, and Social Software Development“, hoping to glean some insight or data into “fork oriented” open source. That wasn’t really what I got. The talk was fairly philosophical for a while. The most interesting (and surprising) part of the presentation was a brief demonstration of Metaweb’s new Freebaseapps.com, which is a development environment for Freebase which embodies some of the principles discussed in the philosophical portion of the talk. From my cloud computing oriented point of view, it looks to me like an “IDE for the cloud”. I need to dig into this a bit more.

One topic which was brand new to me this year was R, which is a functional language for statistical computing and graphics. I’d been hearing a little bit of buzz on R via Twitter, and I was just invited to join the advisory board for REvolution Computing, a startup that is working to foster the R community and to support those users that want a more commercialized offering of R. Since I didn’t know much about R, I found Micheal Driscoll’s talk “Open Source Analytics: Visualization and Predictive Modeling of Big Data with the R Programming Language“. Analytics of all kinds are going to be much more important as the amount of data in web applications grows. If you are interested in big data, and don’t know about R, that seems like a problem. I know that I am going rectify my own personal lack of knowledge.

My talk

As in previous years, I gave a talk at the conference. One of the presentations that I’ve done in several places has a large section about the problem of programming concurrent systems, motivated by the arrival of multicore processors. For OSCON, I took that section of the talk and expanded it into a session of its own. Despite two one hour out loud run throughs, I still got the pacing a little bit wrong and had to rush at the end to get all the content in. If I’m not careful this is going to wind up turning into a three hour tutorial. I’ve embedded the slideshare version for those of you that are interested.

Photography

OSCON is a significant event for me photographically, since OSCON 2005 happened days after I got my first digital SLR. It’s also one of the times that I usually see my friend James Duncan Davidson, who has been one of the people that has helped me along my photographic journey.   

This year things were a little different. Regular readers will know that I am getting a little burned out on conference photographs. I’ve been to a lot of conferences and shot a lot of pictures. After a while, they start to look and feel the same. It’s hard for me to both concentrate fully on the conference, the talks, the hallway track, etc, as well as concentrating on doing stuff that would be interesting photographically. All of which is a long way of saying, “I shot less. A lot less”.

One other reason that I don’t feel bad about shooting less at OSCON is that Duncan is there. Or normally he is. This year, he was absent because he got the nod to be the main stage photographer for TED Global 2009. Those of you who follow Duncan will know that when he needs a second camera he turns to Pinar Ozger. They’ve been working together for a while, but I’ve never met Pinar in person, because I don’t usually end up at the two camera events, and the one time that she, Duncan, and I were all in the same place, we just never ended up meeting. So this was the year that I got to meet Pinar – we bumped into each other at the OSCON speaker’s party and had a great chat. This was also my first time to really get a sense for her eye. When you second shoot for someone, you try to follow the lead of the main photographer. So I am the most familiar with Pinar’s work when she’s working with Duncan. Since she was the lead this year, I (and everybody else) got to see her eye at work. There are some wonderfully artistic shots in her coverage of the show.

One person that you’ll see in Pinar’s set is photographer Julian Cash, the head of the Human Creativity Project. I first met Julian at ApacheCon in San Diego back in 2005. At the time, I didn’t really know much about photography, and I didn’t really get to see much of what he had done with his light painting portraits. Today, I have have much better appreciation for his light paintings. He did one of me at the MySQL conference earlier this year, and he did a bunch at OSCON too.

The photographic tradition at OSCON is going strong.

MySQL Conference 2009

I spent most of this week at the MySQL Conference. I was giving a talk on Python and MySQL, which came about as a favor to some folks in the marketing department at Sun. This was a fair exchange, because I’ve been curious about the MySQL community. MySQL is at the other end of the open source spectrum from the ASF, so I wanted to see for myself what it was like. The MySQL conference is the MySQL community’s equivalent of ApacheCon. There is a mix of talks, some aimed at users of MySQL, and others aimed at developers of MySQL or related products.

There is a sizeable ecosystem around MySQL. There are extension patches from Google and Percona, which were mentioned in many talks that I was in. There’s MariaDB, Monty’s community oriented fork of MySQL. There’s the Drizzle project, which looks really interesting. There’s lots going on, and I got the feeling that there’s lots of innovation happening in various parts of the ecosystem. It feels energetic and fun, and what I would expect of a big open source community, despite it being a long way from Apache or Python.

I attended all kinds of talks. I went to a number of talks about analyzing performance and monitoring, including 3 talks on DTrace. Sadly, these talks were sparsely attended, which is a symptom of some of the problems that Solaris/OpenSolaris has been having. What was interesting was that all of these talks were given by former MySQL employees, and all of them were genuinely enthusiastic about DTrace. The best of these talks was Domas Mituzas’ Deep-inspecting MySQL with DTrace, where he showed some very cool MySQL specific DTrace scripts. If DTrace got ported to Linux as a result of the Oracle/Sun acquisition, that would be a good outcome for the world.

I also went to several cloud computing talks, where the topics was how to run MySQL in the cloud. These were pretty interesting because it turns out that there is a bunch of stuff that you need to do and be aware of when running the current versions of MySQL in a cloud environment. I hope that the Drizzle folks are aware of some of these issues and are able to solve some of these problems so that running in the cloud us pretty simple.

Here are my 3 favorite talks:

  • Don MacAskill’s The SmugMug Tale – I’m a photo guy, but not a SmugMug customer. Don’s been tweeting his experiences using Amazon Web Services to build SmugMug, and he’s been blogging his experiences with ZFS, the Sun Storage 7000, and so forth. I’ve been following his stuff for a while, so this was mostly a chance to see an in person rendering of an on line personality.
  • One talk that I didn’t expect to enjoy was Mark Madden’s Using Open Source BI in the Real World. I’m not really a Business Intelligence guy per se, but the world of blogging and twittering and so forth starts to make you attuned to the usefulness of various kinds of analytics. Anyone building any kind of non-trivial web software need analytics capabilities, so having open source solutions for this is good. It probably also didn’t hurt that I talked to several BI vendors on the expo floor the night before. What I really enjoyed about the talk was the beginning sections on how to be an analyst, think about and project the future. I’m given to a bit of that now and then, so I found this part of the talk pretty interesting.
  • The best talk that I went to was Yoshinori Matsunobu’s Mastering the Art of Indexing. The speaker pretty much covered all the kinds of indexing in MySQL, which indexes work best in which conditions (both for selecting and inserting — there were some interesting surprises for insert), and even tested the differences between hard disks and solid state drives. Maybe I loved this talk because it brought back all the research that I did in query optimization back in graduate school. But that wouldn’t explain all the other people in the room, which was standing room only.

Based on what I saw this week, I’m not in any way worried about the future of MySQL.