TITLE OF PAPER: Python at Google URL OF PRESENTATION: --not available-- PRESENTED BY: Greg Stein REPRESENTING: Google CONFERENCE: PyCon 2005 DATE: March 25, 2005 LOCATION: Marvin Theater -------------------------------------------------------------------------- REAL-TIME NOTES / ANNOTATIONS OF THE PAPER: {If you've contributed, add your name, e-mail & URL at the bottom} [ A new copy of the O'Reilly Python Success Stories booklet will be produced Contact Stephan Diebel @ pythonology.org ] "Python has been an important part of Google since the beginning, and remains so as the system grows and evolved. Today dozens of Google engineers use Python, and we're looking for more people with skils in this language" -- Peter Norvig, Director of Search Quality at Google My background Python developer 10 years Contributed to Python itself Authored a number of modules and applications ViewCVS Open Source Guy Contributed to numerous projects (including Python) Current chairman of the Apache Software Foundation ViewCVS, written entirely in Python Contributed to Subversion, Apache server "We consider Python to be our 'secret sauce'" --Paul Everitt, talking about Digital Creations, circa 1996 This is a recognition of how Python can help a business. My view of Python in the workplace Python at eShop 1995 "What in the world is Python?" 1996 "This is great stuff." (MS acquired eShop in '96) Python at Microsoft 1996: "It's called what?" 1997: "You actually shipped Python code?" (MerchantServer 1.0) 1998: "Nice prototype. We'll rewrite it in the next version." And they did, in C++. Python in the workplace (continued) Python at CollabNet 2001: "No, we don't really use Python here." (they used Java) 2003: "Definitely! Write that in Python" Python caught on here like a virus, moving from developer to developer. Python at Google 2004 "Of *course* we use Python. Why wouldn't we?" Changing attitudes over time Small companies eventually "Got it" ahead of the curve Champion was needed Larger Companies follow Python's growth curve Supporting environment was needed A number of factors made Python possible in larger organizations: It is now possible. Here's why: Python had to grow for it to become "business acceptable" Large enough talent pool - "where are we going to be able to find these people?" Support services: Books, Consulting, World Wide Web Follow the trailblazers Python passed the tipping point years ago Not a problem to incorporate it into your business, lots of support, consulting Business advantage "These are some of the reasons we use Python at Google" Highly adaptable Changing requirements - You need a language that is very flexible, so you can adapt your tools during development Changes in computing environment Rapid development For new and experienced developers The market moves very very quick; you want to be able to keep up with it. If it takes two years for you to respond to something that is needed today, you're behind the curve. Easy to maintain - most important point in Greg's viwe You can come back a year later, look at that code, and understand what is going on Google's programming environment Primary Languages C++ Java Python If you want to write a piece of something else, like Perl, you have to almost get special permission. (Exceptions in ops, but for actual product stuff, see above) Miscellaneous Some Perl used by Operations (others almost have to get permission to use Perl) PHP creeeps in for internal webapps Saw Ruby sneaking around Small amount of C# In actual progress stuff, C++, Java, Python SWIG is your friend SWIG: Simplified Wrapper Interface Generator www.swig.org Started by David Beazley Multi-language environment A lot of people at Google don't know Python and produce C++ code. SWIG pulls these "islands together"--they have a lot of stuff lying around written in various languages. SWIG examines a C++ header file and auto-generates Python bindings So for all of our libraries that we have - for parsing HTML, crawling HTTP and so on - they are made available to Python using SWIG. Good for Google programmers who use C++ but don't know Python Very fast mechanism for integration Integrated into build system Makes it very easy for us to add a rule into our build system to just add a library into our python dependancy module Where do we use it? Across our internal network Across a system lifecycle Live Services Basic Network Some usage to support development Wrappers for Version control (Perforce) (JB note: Perforce can output marshalled Python objects -- very cool, extremely useful for scripting. Also see svn SWIG mention in Q&A) They improved branch management. Running unit tests on checkin People "earn" their ability to check in after then understand code guidelines, etc. Automatically enforce style guidelines Build System (itself written in Python) Packaging We've got giant bundles of code and giant bundles of data which need to be delivered up to the servers. Packaging system is built in Python Third generation of this system Ability to roll back a version We can keep iterating and moving forward because we're building all this stuff in Python Some usage in the network infrastructure Binary/data pusher Figures out best way to send stuff from one place to another -- dev to data center, etc We're on third/fourth generation of this, keep increasing the scale of the problem. Python's making that possible - able to iterate quickly Package repository Some usage on production servers Monitoring Is this thing still alive? Is it running? Does it think it's healthy? Is it seeing problems with the hard disk? Is the CPU temperature fine? All of this information is gathered with a little Python program running on the server, then collected by another Python program. Auto-restart Complete the Lifecycle Log reporting We generate a "large" amount of log information Data is pulled back from the servers Analyzed using lots of Python tools Ad group needs to spot fraudulent clicks. This is a constant cat-and- mouse game with the script kiddies writing fraudulent ad clickers. Easy to alter the reports based on ever-changing needs Every time we find some way people are fraudulently clicking our ads, we patch that hole. It's a continuous process. Python-based servics Google Groups "Python Old-timers" David Jeske and Brandon Long (of eGroups and Neotonic/ClearSilver) are the leads on Groups. All built using Python code Highly pythonic They didn't use that giant mountain of C++ stuff code.google.com Stein and DiBona Others? We have so much going on... How code.google.com was built (block diagram) /\ \/ Front end Stuff /\ \/ code.google.com SWIG Google Stuff The funky front end stuff deals with denial of service attacks, reporting, blocking IPs known to be bad We get to take advantage because we've wrapped this The HTTP server it's built on has all of the reporting and monitoring things on it - the "Google Stuff" code.google.com goopy package - support for functional style programming Functional stuff to start with Place to put future modules Closing We have a lot of Python code, covering a broad range of needs. Python has helped Google for many, many years. SWIG is underrated. I saw a little rant on Guido's blog (Guido shakes head) - it's kind of difficult to get your head wrapped around it but when you need access to some library of functionality from Python you don't need to go and bulid it yourself - you can use SWIG to wrap it automatically. This fits the Python ideal of smart reuse. We are now starting to open-source some of the pile. Questions and Answers (a good 25 minutes for these) Q: When are you going to open source the build system? (Guido) A: I don't know. If I recall, Greg has talked about it Chris DiBona: We're thinking of releasing some of our wrappers around Perforce first Q: About SWIG, have you looked at the Boost::Python library? A: I did see that come up recently; I don't think we use it a lot but it has been mentioned. I'll take a closer look at it. Q: What about ctypes? A: I saw that a while ago on a different project. As far as I know we don't use it, SWIG works well with our build system Q: elaborates on ctypes/SWIG differences. While SWIG will build a Python wrapper for a given C lib, ctypes will let you dynamically load up a C lib and call its functions. A: calldll does something similar for windows environment Q: Do you do anything in regard to network monitoring / SNMP with Python? A: We do have a very large internal network, lots of traffic, the Ops guys do have monitors to watch the flow, have to schedule moving large (100 GB or 1 TB-size) files. Q: (Alex Martelli - who is starting at Google in three days) Back to the wrapping issue. SWIG and ctypes will not help at all with C++ templates - Boost is better in this regard. SWIG has been extended to support templets recently. A: We do use some templates, but we normally try to avoid them and use SWIG. In that sense, SWIG works well for us. Some of the template stuff I'd like better access to, and I end up having to do some extra goo to get things working. Q: What is missing from the Python ecosystem? A: (Anna Ravenscroft, Alex's wife, yells "Alex") But we've solved that problem. Today they are mostly using Python 2.2, trying to figure out how to use Python 2.3 -- big upgrade problem Q: How do you evangelize people who are happy with C++ and SQL and don't seem to want to try Python? A: We make it easy to use any of the languages, and don't really force people to use a different language. The different applications are based on what the team understands best. We make it easy for all of these things to interact - if you have a server written in Java we have a custom RPC system that helps bridge the gap and communicate with other servers. Q: How many software engineers roughly does Google employ (Steve Holden)? A: I do know that the public employee count is over 3,000 employees as of December, but I don't know the break-out in terms of numbers of engineers. It's hundreds of engineers but I can't really say any more. Some of the apps written in Java (blogger) can communicate with C++ using RPC, so not using Python is not a problem Q: You must have masses of linguistic data (terabytes). How do you access that data so fast? A: Yes. I don't know, I don't work in that area. As far as speed, "we just throw servers at it." Q: Within Google, is there anything for which Python is considered inappopriate? A: Is there anything where Python is not appropriate? Well yeah, something like our indexing system where we scan the web pages and produce an index. Python is good, and fast, and IronPython is even faster, but it's not fast enough. We use C for that. For other things, it's based on the engineering team. We make it possible for the teams to use what language they like. Personally, I'd like to see more Python, so some of the things I've been doing have been working on enabling that. Q: What kind of bug-tracking system do you use? A: Bug tracking. Our system is not that good. We have one, anybody in their right mind has one Bugzilla derivative MS has an awesome bug tracking system Even what I had at collab.net was better Google's looking at different options for fixing that system. Q: I want to jump in with another comment on wrapping. I have a plotting library in C++ with heavy use of templates and I tried wrapping it in three different things (cxx, Boost, and SWIG). SWIG is actually pretty good now, swig template support is much better than it used to be. Boost makes things way, way too big. A: Based on this feedback it seems like Boost is capable in certain environments and is definitely worth looking at. Need to evaluate before using. Q: SWIG performance in real time environment? A: It is a non-issue. However, I was challenged about this at MS: someone said "Python won't be fast enough!" I said, "how fast does it have to be? 1000 pages per second?" He couldn't say. So I said "then just don't worry about unless it proves too slow." We did go ahead and rewrite some of Python the stuff into ActiveX COM objects and ASP and... it was slower (laughter and applause). Much time in Python is spent outside the interpreter loop; much time is spent, e.g., in the String object, which is written in C. [On code.google.com] There's still that Global Interpreter Lock in there, but I still saw some SERIOUS page performance on that thing. Don't be afraid of bringing Python into your projects.. Your bottleneck will be the network bandwidth (some person on a 56kbps line), not Python Q: Mentioned a number of languages used at Google. We use Python because it's terser (among other reasons). Can you speculate on lines of code in various languages at Google? (Do you even know total lines of code at Google?) A: I have no idea. It's a LOT. Joke from audience: the code counter is still running! C++ is probably the majority, probably followed by Python. C++, Python, Java - gut feeling Q: Five years from now, if people are right about Moore's law, more multiprocessor systems. What about the getting rid of the Global Interpreter Lock project that you did a few years ago? A: Wow. Yeah, that was a few years ago. Back in '96 I made a few patches to Python 1.4 to get rid of the GIL. We used that at MS to make free threaded COM objects. We were getting a lot of lock contention. We had to protect different data structures - like in Python there are pools of frame objects which had to be protected (??). Things were blocking around those pools. For 2 processors there was a bonus, but for 3 or 4 it was actually slower. Free threading - Python's thread state was one of the benefits from that set of patches. sys.exc_info was another. The Global Interpreter Lock hasn't actually been a problem. Q: Every once in a while, you are going to introduce a bug into the system. How do you guys debug across the language boundaries? A: We don't have any particular tools, or antyhing like that. Have libraries for logging. My favorite technique is adding print statements (applause/ laughter). It would be wonderful if we had special tools but we don't. Some people ask what IDE they should use for cross language Java/Python development. Eclipse is quite good, but even that doesn't have any cross- language stuff. Q: Do you have any current hobby projects that you are working on that you can talk about? A: Stuff outside Google they can't tell me not to talk about. Subversion based wiki (subwiki) svn exposes its libraries to Python via SWIG You could build a new svn client or interact with a server from Python ViewCVS does this subwiki uses the svn repository to store the wiki pages Googly stuff - mostly code.google.com Q: What does Google have to say about web application frameworks A: It's a tough one. Lot of stuff set up in C++. code.google.com was not built using an off-the-shelf framework; we used Google's custom HTTP server. GMail is not written in Python. I don't actually know if it's C++ or Java. (Chris DiBona: it's Java.) Q: Followup - is there anything that Google can contribute (via open source) in the web framework arena? A: Got a lot of stuff we've been talking about moving into the open soruce arena. Stuff tends to build on itself; trying to get it untangled. Stuff relies on Google-specific stuff, won't be interesting outside of Google. Q: Tim O'Reilly talked about Google redefining applications. In this view we're sort of moving away from Google 1.0. When you upgrade, what sort of staging environment do you have? A: We definitely have staging environments. One of the things built in to the systems I talked about for moving things out. The main web server - www.google.com - is a BIG chunk of code and data - because we have translations and stuff for everything. In any case, they're called canary servers (chuckles from crowd) - we put stuff on the canary servers and see if they're going to fall over. Also, because we get so much traffic we can turn a knob and expose something like 1% of our traffic to those servers. If they don't fall over, we expose some more. The "turning the knob" is a little command line tool written in Python. Q: (Alex Martelli) Prompted by your mention of unwrapping pieces so they can be open source. It actually sounds like something that's a very good software engineering exercise, because it forces decoupling from your proprietary stuff. Even if we never open source the actual pieces, just having done the unwrapping seems like a big advantage. A: It would be a big advantage if we were distributing code. For us, a 50 MB executable is not a problem, though you'd never try to push that to a client too often. While it would be an interesting engineering exercise and would improve the code it has not been a priority. Chris DiBona followup: Opening your code tends to make it better, for example in our (?)malloc library we said it worked faster for these situations, and when we looked at it we found a bug in our code. -------------------------------------------------------------------------- REFERENCES: {as documents / sites are referenced add them below} http://www.swig.org http://code.google.com -------------------------------------------------------------------------- QUOTES: "We don't do that at Microsoft; we ship C++ code" "Python passed the tipping point years ago" "[You can] read [Python] in 2 hours, program in it in 2 days, be productive for the company in 2 weeks." "We use a LOT of SWIG" "We've got quite a few servers..." (laughter) "I've worked in large environments before, but nothing on the order of this" "We have a lot of log data" "Today we're using primarily Python 2.2 deployed on our servers, but we're trying to work out how to move to Python 2.3." "Our bug tracking system is not that good" "Pushing bits out to some guy on a 56k modem IS your bottle neck. Pulling records out of a database is your bottleneck. It's very rarely going to by Python." "I think we probably have more Python code than we have Java" - a guess "I think we probably have more Python than we do Java, because of all of those tools and things for supporting the environment, wrappers and all these things." "Mr. Ascher. That's Dr. Ascher, to you." "My favourite debugging environment is PRINT." -------------------------------------------------------------------------- CONTRIBUTORS: {add your name, e-mail address and URL below} Ted Leung Linden Wright Erik Rose Andy Wright Nicholas Riley Simon Willison Jonathan Blocksom Abhay Saxena -------------------------------------------------------------------------- E-MAIL BOUNCEBACK: {add your e-mail address separated by commas } -------------------------------------------------------------------------- NOTES ON / KEY TO THIS TEMPLATE: A headline (like a field in a database) will be CAPITALISED This differentiates from the text that follows A variable that you can change will be surrounded by _underscores_ Spaces in variables are also replaced with under_scores This allows people to select the whole variable with a simple double-click A tool-tip is lower case and surrounded by {curly brackets / parentheses} These supply helpful contextual information. -------------------------------------------------------------------------- Copyright shared between all the participants unless otherwise stated...