TITLE OF PAPER: Pulling Java Lucene into Python URL OF PRESENTATION: _URL_of_powerpoint_presentation_ PRESENTED BY: Andi Vajda REPRESENTING: Open Source Applications Foundation CONFERENCE: PyCon 2005 DATE: 3/25/2005 LOCATION: Grand Ballroom -------------------------------------------------------------------------- REAL-TIME NOTES / ANNOTATIONS OF THE PAPER: {If you've contributed, add your name, e-mail & URL at the bottom} Why PyLucene? Needed probabilistic search engine, and Lucene (written in Java is a good, open source one - but not willing to use Java SWIG file contains all interfaces - compiles Java Lucene - produces giant C++ file, various magic ensues to create a Python extension [incomplete diagram] Java Lucene --> gcj --lecene.o------> PyLucene_wrap.o | | | \/ \/ PyLucene.pyd gcjh makes C++ headers Impedence mismatches SWIG thinks you're wrapping C++, but you're wrapping Java, so you have to match the memory models. Once you pass an obj from Java to Python, Java quits refcounting it; this has to be worked around. The same problem manifests when returning an obj via JNI from C++ to Java. Thread/concurrency models differ. Python is more flexible. Java really wants to control...so delegate it He made a class called PythonThread that delegates thread management to Java so it won't freak out. This changed PyLucene from being kinda crashy to very stable Extending Java classes from Python He made SWIG recognize that Python class X implements a protocol that lets it be wrapped by a Java class. It's SWIG in reverse: GIWS! ;-) The Java class then delegates to the Python methods by calling them via the JNI and Python's C interface. Supporting downcasting [SK Note: See Michel Salib's presentation "Indexing the US Patent DB with Python and Xapian" yesterday. They tried to use Lucene (and PyLucene) first, but had a lot of trouble with builds in their very multi-platform environment. Then they discovered Xapian, which was much easier to wrap and build. They thought Xapian provided faster queries and better precision/recall. It comes out of Cambridge University (UK). They will be releasing their pyXapWrap (or something like that) project to open source.] Cross-language error reporting Looks like they wrapped it in both directions. I'm not sure, but it looks like Python exceptions are wrapped into Java exceptions and vice versa. Samples Future work Pulling other libraries into Python in the same way 800 line makefile! PyLucene may soon end up under the Apache Foundation, along with Lucene itself. (Yay!) -------------------------------------------------------------------------- REFERENCES: {as documents / sites are referenced add them below} There's a book about Java Lucene. Jakarta Lucene project: http://lucene.apache.org/java/docs/ PyLucene project: http://pylucene.osafoundation.org/ -------------------------------------------------------------------------- QUOTES: -------------------------------------------------------------------------- CONTRIBUTORS: {add your name, e-mail address and URL below} Erik Rose Sally Kleinfeldt -------------------------------------------------------------------------- E-MAIL BOUNCEBACK: {add your e-mail address separated by commas } -------------------------------------------------------------------------- NOTES ON / KEY TO THIS TEMPLATE: A headline (like a field in a database) will be CAPITALISED This differentiates from the text that follows A variable that you can change will be surrounded by _underscores_ Spaces in variables are also replaced with under_scores This allows people to select the whole variable with a simple double-click A tool-tip is lower case and surrounded by {curly brackets / parentheses} These supply helpful contextual information. -------------------------------------------------------------------------- Copyright shared between all the participants unless otherwise stated...