TITLE OF PAPER: _name_of_the_paper_here_ URL OF PRESENTATION: _URL_of_powerpoint_presentation_ PRESENTED BY: _names_of_the_participants_ REPRESENTING: _name_of_the_company_they_represent_ CONFERENCE: PyCON 2004 DATE: 20040325 LOCATION: _venue_and_room_in_venue_ -------------------------------------------------------------------------- REAL-TIME NOTES / ANNOTATIONS OF THE PAPER: {If you've contributed, add your name, e-mail & URL at the bottom} Pyrex is cool but not magic Pyrex very seldom makes pure-Python code faser just through a recompile But if you understand how Pyrex works, you can dramatically improve Python program performance: use static type checking, static binding C calling convertions, etc maybe less dynamic, less flexible, but better performance it gives you more choices in how you can choose to optimize your code - now if you want you can dip down into the C level to get that extra speed Bisect Module Originally code in Python Recoded in C recently for performance Pure Python version is between 2 and 3 times slower han C version Recompile as Pyrex improves performance by only a few percentage points Representative function [original Python code] def bisect_left(a, x, lo=0, hi-None): if hi is None: hi = len(a) while lo < hi: mid= (lo+hi)//2 Optmization step 1: Static types cdeef int internalbisect_left(a, x, int lo, int hi) except -1: cdef int mid while lo < hi: mid = (lo+hi)/ 2 if a[mid] < x: lo = mid + 1 else: hi = mid return lo Still slower than C: - partly because calling PyObject_Get/SetItem() rather than PySequence_Get/SetItem() - PySequence functions take ints instead of PyObject Optmization step 2: cheat import and use PySequence_GetIem directly from Python.h if PySequence_GetIem(a, mid) < x: lo = mid + 1 else: hi = mid Can we automate this? I propose to change Pyrex to have first-class support for C sequence types Pyrex with sequence types Now my code looks like this cdef int internal_bisect_left(sequence a, x, int lo, int hi) except -1 : cdef int mid while lo < hi: mid = (lo+hi)/2 One more twist the C bisect module has this code: if (PyList_check(list)) { if (PyList_insert(list, index, item) < 0) return NULL; } else { if (PyObject_CallMethod(list, "insert", .... Pyrex can do that too! if PyList_Check(a): by the way note how 2 lines of Pyrex equal approximately 6 lines of C code Result Even so, Pyrex seems to come out about 25% slower than C but half as many lines of code! no decrefs no goto statements autmatic exception propation Case study 2: fibonacci Be careful As always, you have to be careful benchmamrking and optimizing for instance: GCC compiler flags can make a big difference to the speed of the code [chart showing that the different optimizations can actually slow down pyrex code -- although highest levels of optimization show 3-4x improvement over no optimization] Moral: test different levels of optimization Heap Queue similar story to Bisect But even with sequence types added, Pyrex loses out heapq.c uses some highly optimized PyList APIs: #define PyList_GET_ITEM(op, i) (((PyListObject *) (op)) .... it's an inline macro call - so it could be imported into Pyrex itself and you can call the macro itself to gain the speedup example code showed that you can reference a C macro inside of pyrex and since Pyrex compiles to C the C preprocessor handles the expansion be carefule about refcounting Results With every cheat I could think of ... Crushed Python (as much as 6 times faster depending on the operation) Didn't quite rival C (30 - 60% slower). maybe somebody smarter than he can find some optimizations I missed... But no code like this: Case Study 5: Py(e)xpat: [Pyexpat in Pyrex -> pyxpat] Real-world example Years ago I was one of several people to help wrap Expat in C It wasn't rocket science but it was a pain. There were many niggly details E.g. a couple of 40 line functions to allow stack traces to propagate from call backs through Python??? Ugly macros everywhere Careful reference counting Plus all of the usual C extension ugliness I tried again Pyrex , yielding pyxpat. Results the Pyrex version is consistently faster than the C verion, but it may cheat a bit: the C version implements some optional features that the Pyrex version does not in C the benchmark doesn't /use/ the features in Pyrex, it doesn't even /have/ the feature that means a few less conditionals and branches either way, Pyrex is impressively fast! Statitistics I parsed a file with the folowing statistics: 400 mb of data 5 million elements 10 million startElement/endElement callbacks into pure Python code it took 3 minutes on 700mhz PowerBook this exercise is /not/ IO bound One Step further Pyxpat can explicitly expose a callback interface that uses Pyrex/C calling conventions rather than Python? Without changing out logic or interface, we can slash the time in half! It turns out that half of Pyexpat's time is spent on the Python function call dance and that even ignoring type coercion issues The result of the experiment was it was twice as fast using Pyrex callbacks instead of Python callbacks Where does Pyrex fit in? Pyrex code isn't quite as fast as hand-coded C. But you can focus on algorithm rather than pointers and buffer. (sound familiar) Pyrex code can live on a spectrum from almost as abstract as Python to almost as efficient as C the next best thing to the best of both worlds What are the obstacles? You will only get performance out of Pyrex if you understand implementation details. Need to understand C Helps to understand the cost model of the Python internals Pyrex should allow programmers to declare information relevant to the optmizer Pyrex ought to generate more efficent code out of the box Optimizing Pyrex itself Pyrex itself is ripe for optimization: More knowledge about Python container types better string interneing static binding of globals and builtins (yes, Pyrex looks up len() in __builtins__) bypass PyArg_parsetuple when there are not arguments PyObject_NEW instead of PyObject_CallObject My thoughts (Prescod's) Given how easy Pyrex is, I predict that Python+pyrex programs will typically go faster than Python+C programs /given the same amount of programmer effort./ If you are building a system that needs some performance...try Pyrex. It is probably fast enough for anything short of a database or a network stack Q: Numerically intensive code, matrices he needs are many and small, how ab out using Pyrex instead of Numeric A: he doesn't have any doubts that Pyrex would be good for that, you could create structs or other types to represent them, but that Pyrex doesn't know about templates you would have to create code for all variations -- when you are working with C types you don't get for free all of the Python exception handling for like overflows Q: slides will be avaiable online? A: yes - on his site and on the PyCon site -------------------------------------------------------------------------- REFERENCES: {as documents / sites are referenced add them below} A related paper: http://www.prescod.net/python/pyrexopt/optimization.html Bisect module recoded in Pyrex: http://www.prescod.net/python/pyrexopt/bisect/rxbisect0.pyx With static types added: http://www.prescod.net/python/pyrexopt/bisect/rxbisect1.pyx Calling PySequence_GetItem() directly: http://www.prescod.net/python/pyrexopt/bisect/rxbisect2.pyx With patch to Pyrex adding direct support for 'sequence' type: http://www.prescod.net/python/pyrexopt/bisect/rxbisect3.pyx Importing C macro as though it were a C function: http://www.prescod.net/python/pyrexopt/heapq/rxheapq4.pyx Expat wrapped with pyrex: http://www.prescod.net/python/pyrexopt/pyxpat/pyxpat.pyx -------------------------------------------------------------------------- QUOTES: -------------------------------------------------------------------------- CONTRIBUTORS: {add your name, e-mail address and URL below} Ted Leung, twl@osafoundation.org, http://www.sauria.com/blog Russell Finn, rsf@sprucehill.com, http://www.sprucehill.com/rsf/blog/ [future] Ian Jones, ianjones@umich.edu Mike Taylor, bear@code-bear.com -------------------------------------------------------------------------- E-MAIL BOUNCEBACK: {add your e-mail address separated by commas } -------------------------------------------------------------------------- NOTES ON / KEY TO THIS TEMPLATE: A headline (like a field in a database) will be CAPITALISED This differentiates from the text that follows A variable that you can change will be surrounded by _underscores_ Spaces in variables are also replaced with under_scores This allows people to select the whole variable with a simple double-click A tool-tip is lower case and surrounded by {curly brackets / parentheses} These supply helpful contextual information. -------------------------------------------------------------------------- Copyright shared between all the participants unless otherwise stated...