Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
PyLucene, mail # dev - Changes to enable easy_install of packages using JCC


+
Chris Wilson 2012-02-01, 13:27
+
Andi Vajda 2012-02-01, 20:13
Copy link to this message
-
Re: Changes to enable easy_install of packages using JCC
Chris Wilson 2012-02-01, 22:35
Hi Andi,

Thank you for your quick and positive reply :)

On Wed, 1 Feb 2012, Andi Vajda wrote:

>>  I have been working on integrating Apache Tika (in Java) with our open
>>  source intranet application (in Python/Django) using JCC...
>
> Using Maven there helped considerably with getting all the pieces on the
> Java side.

Although I used maven for an initial compile of Tika, I realised that it
would work just as well if I downloaded pre-built jar files, which I did
from http://repo1.maven.org/maven2/org/apache/tika/.

> Your remark about not needing JCC's shared library mode is probably
> correct right now but as soon as anyone brings in another JCC-built
> library into the same process as yours, shared mode is going to be
> required since the Java VM can only be initialized once per process.

I understand that, but I'm prepared to live with that limitation for now,
as this is likely to be the only Java library that I integrate into this
Python/Django application. I tried hard to find pure Python solutions, but
Tika is simply miles ahead of the competition.

> No objections to these patches in principle but it would be easier for
> me to integrate them if you could provide patches computed from the svn
> repository of JCC:
> http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/ Your patches
> seem to be small enough so I should be able to do without but it would
> be nicer if I didn't have to guess...

I think the patch that I attached was already based on trunk. The git
repository includes the .svn directories, points to trunk, and I generated
the patch using "svn diff".

> Also, please write small descriptions for these new command line flags to go
> into JCC's __main__.py file:
> http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/jcc/__main__.py

Done, new patch attached.

> This mess of setuptools patching was meant to be *temporary* until
> setuptools' issue 43 was fixed. As you can see, I filed this bug 3 1/2
> years ago, http://bugs.python.org/setuptools/issue43, and my patch for
> issue 43 still hasn't been accepted, rejected, integrated,
> anything'ed... Dormant. For over three years.

Sorry about that. I've had similar experience with bugs reported against
ubuntu, hibernate, rails... :(

>>  * Why does JCC use non-standard command line arguments like --build and
>>  --install? Can it be modified to make it easier to invoke from a
>>  setup.py-style environment, such as exporting a setup() function as
>>  setuptools does?
>
> What standard are you referring to ?
> The python extension module build/install/deploy story on Python keeps
> evolving... Add Python 3.x support into the mix, and the mess is complete.
>
> Seriously, though, I think that the right thing to do to better integrate JCC
> with distutils/setuptools/distribute/pip/etc... is to make it into a
> distutils 'compiler'. This requires some work, though, and I haven't done it
> in all thee years. Anyone with the itch to hack on distutils is welcome to
> take that on.

I'm afraid I don't fully understand how distutils works, it seems to be
sparsely documented, and I don't have a lot of time and energy to work on
refactoring jcc. I am a bit surprised that we can't just generate a source
distribution containing the jars, .cpp files and a setup.py which does the
rest like any other Python extension.

> I have very little itch to dabble in configure scripts either so I've
> been dragging my feet. If someone were to step forward with a patch for
> that, I'd be delighted in ripping out all this patching brittleness.

How would a configure script solve the problem and what would it have to
do? Generate the .cpp files? How does it integrate with Python extensions?

> That is a whole different project. If I remember correctly, the JPype
> project is (or was) taking that approach: http://jpype.sourceforge.net

OK, thanks.

>>  * Could JCC generate a source distribution (sdist) that could be
>>    uploaded to pypi?
>
> You mean a source distribution that includes the Java sources of all the

I was thinking more of the jars. Something like
https://github.com/aptivate/python-tika that doesn't depend on jcc any
more.
http://packages.python.org/distribute/setuptools.html#development-mode

It seems that when invoked this way, my setup.py (from python-tika) which
calls jcc ends up creating build/_tika as a file (not a directory).

For example, this command:

   sudo pip install -e git+https://github.com/aptivate/python-tika#egg=tika

(note the -e for editable mode) results in this:

   Running setup.py develop for tika
   ...
     Traceback (most recent call last):
       File "<string>", line 1, in <module>
       File "/tmp/src/tika/setup.py", line 108, in <module>
         cpp.jcc(jcc_args)
       File
"/usr/local/lib/python2.6/dist-packages/JCC-2.12-py2.6-linux-i686.egg/jcc/cpp.py",
line 587, in jcc
         os.makedirs(cppdir)
       File "/usr/lib/python2.6/os.py", line 157, in makedirs
         mkdir(name, mode)
     OSError: [Errno 17] File exists: 'build/_tika'

That file appears to contain the source code for the JCCEnv.cpp wrapper.
Perhaps it's useful to (automatically) emit warnings for classes in the
JAR files included with --jar or an explicit class name, but not those in
classpath)?
Thanks again for your help :)

Cheers, Chris.
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.
+
Andi Vajda 2012-02-01, 23:59
+
Andi Vajda 2012-02-04, 22:13
+
Chris Wilson 2012-02-04, 22:45
+
Bill Janssen 2012-02-02, 04:49
+
Andi Vajda 2012-02-02, 05:51
+
Bill Janssen 2012-02-02, 16:29
+
Andi Vajda 2012-02-02, 16:50
+
Bill Janssen 2012-02-02, 19:56
+
Andi Vajda 2012-02-02, 22:51
+
Bill Janssen 2012-02-02, 23:22
+
Andi Vajda 2012-02-03, 03:21