Pulling Java Lucene into Python.notes

Friday, March 25, 2005

TITLE OF PAPER: Pulling Java Lucene into Python
URL OF PRESENTATION: _URL_of_powerpoint_presentation_
PRESENTED BY: Andi Vajda
REPRESENTING: Open Source Applications Foundation

CONFERENCE: PyCon 2005
DATE: 3/25/2005
LOCATION: Grand Ballroom
--------------------------------------------------------------------------

REAL-TIME NOTES / ANNOTATIONS OF THE PAPER:
{If you've contributed, add your name, e-mail & URL at the bottom}

Why PyLucene? Needed probabilistic search engine, and Lucene (written
in Java is a good, open source one - but not willing to
use Java

SWIG file contains all interfaces - compiles Java Lucene -
produces giant C++ file, various magic ensues to create a
Python extension



[incomplete diagram]
Java Lucene --> gcj --lecene.o------> PyLucene_wrap.o
|                                 |
|                                 \/
\/                               PyLucene.pyd
gcjh makes C++ headers

Impedence mismatches
    SWIG thinks you'
re wrapping C++, but you're wrapping Java, so you have to
        match the memory models. Once you pass an obj from Java to Python, Java
        quits refcounting it; this has to be worked around. The same problem
        manifests when returning an obj via JNI from C++ to Java.
    Thread/concurrency models differ.
Python is more flexible.  Java really
        wants to control...so delegate it

        He made a class called PythonThread that delegates thread management to
            Java so it won'
t freak out.
        This changed PyLucene from being kinda crashy to very stable



Extending Java classes from Python
    He made SWIG recognize that Python class X implements a protocol that lets
        it be wrapped by a Java class. It's SWIG in reverse: GIWS! ;-) The Java
        class then delegates to the Python methods by calling them via the JNI
        and Python'
s C interface.


Supporting downcasting


[SK Note:  See Michel Salib's presentation "Indexing the US Patent DB with
Python and Xapian" yesterday.  They tried to use Lucene (and PyLucene) first,
but had a lot of trouble with builds in their very multi-platform
environment.  Then they discovered Xapian, which was much easier to
wrap and build.  They thought Xapian provided faster queries and better
precision/recall.  It comes out of Cambridge University (UK).
They will be releasing their pyXapWrap (or something like that) project
to open source.]

Cross-language error reporting
    Looks like they wrapped it in both directions. I'
m not sure, but it looks
        like Python exceptions are wrapped into Java exceptions and vice versa.

Samples
    

Future
work
    Pulling other libraries into Python in the same way



800 line makefile!  

PyLucene may soon end up under the Apache Foundation, along with Lucene itself.
(Yay!)

--------------------------------------------------------------------------
REFERENCES: {as documents / sites are referenced add them below}
There's a book about Java Lucene.

Jakarta Lucene project:  http://lucene.apache.org/java/docs/
PyLucene project:  http://pylucene.osafoundation.org/


--------------------------------------------------------------------------
QUOTES:



--------------------------------------------------------------------------
CONTRIBUTORS: {add your name, e-mail address and URL below}
Erik Rose <corp@grinchcentral.com>

Sally Kleinfeldt <skleinfeldt@tnc.org>



--------------------------------------------------------------------------
E-MAIL BOUNCEBACK: {add your e-mail address separated by commas }



--------------------------------------------------------------------------
NOTES ON / KEY TO THIS TEMPLATE:
A headline (like a field in a database) will be CAPITALISED
    This differentiates from the text that follows
A variable that you can change will be surrounded by _underscores_
    Spaces in variables are also replaced with under_scores
    This allows people to select the whole variable with a simple double-click
A tool-tip is lower case and surrounded by {curly brackets / parentheses}
    These supply helpful contextual information.

--------------------------------------------------------------------------
Copyright shared between all the participants unless otherwise stated...