This puts both antiword and pdftotext in /usr/bin. I located it on my server to verify the path, tried to start pdftotext and antiword from a commandline, checked the execute permissions and that all seems correct.
On the settingspage from Global Search, I've put:
Pad naar commando pdftotext: /usr/bin/pdftotext -enc UTF-8 -eol unix -q
Pad naar commando doctotext: /usr/bin/antiword
I left
Omgevingsinstellingen voor de MSWord-convertor: ANTIWORDHOME=/var/www/elo/lib/antiword/linux/usr/share/antiword
unchanged
When I run the test indexer, I see loads of errors like
Error with pdf to text converter command : exectuable not found.
Error with MSWord to text converter command : exectuable not found.
I know the errormessage is quite clear, but I can't see what's wrong here. Anyone any idea?
Hi Koen,
The fact is that xxxtotext converters can only be invoked within the Moodle distribution directory.
Installing antiword and xpdf as standalone packages will not comply with this rule. This was done for a future complete integration of thoses libraries in the /lib distribution, and in case some security settings of PHP and Apache would have restricted access to that executable elsewhere.
This is a restriction I may spit off, but we should have a global developper and architecture discussion before.
In the meanwhile, ensure executable are within <%%Moodleroot%%>. I confess I have no implementation running actually on Linux to evaluate deployement issues. This will come soon, I hope.
Rules are :
I take the value given for the path of the executables, and then the executable is found using the following construction :
$text_converter_cmd = "{$CFG->dirroot}/{$CFG->block_search_pdf_to_text_cmd} $file -";
for PDF, and
$text_converter_cmd = "{$CFG->dirroot}/{$CFG->block_search_word_to_text_cmd} $file";
for Word
Could this help you to setup the search engine. I will add some references to that point in Moodledocs.
I don't know if it is done already, but it might be a good idea to include them in 1.9 in one go
I did a CVS-checkout and copied the files on the right spot, preserving the directory structure. I chmod 751 the files to make them executable, but no luck yet.
Error with MSWord to text converter command : execution failed.
Error with pdf to text converter command : execution failed.
It's late. I look further into it tomorrow
When it works, I'll try to write some docs on this.
Could anyone make Global Search working on Linux? "Special tricks" needed?
I've just made it work with Moodle 1.8.3+ (current as of today) on Debian etch. Once you I've had the required packages installed (xpdf-utils, antiword, etc.), I've just executed the following commands as root:
export DIRROOT=/usr/share/moodle
mkdir -p ${DIRROOT}/lib/antiword/linux/usr/bin/
ln -s /usr/bin/antiword ${DIRROOT}/lib/antiword/linux/usr/bin/antiword
mkdir -p ${DIRROOT}/lib/xpdf/linux
ln -s /usr/bin/pdftotext ${DIRROOT}/lib/xpdf/linux/pdftotext
to use the Global Search default paths. I've launched the text indexer and it has run without problems. I've even enabled 'file indexing' and it has run without trouble, and has indexed the only pdf file I had (this is a test setup that is almost empty).
Hope that helps.
Saludos. Iñaki.
Anyway, I am going to do something similar on Solaris. Solaris has the great Blastwave package repository (a lot like apt-get) and I have installed "xpdf" and "antiword" using blastwave's pkg-get. On Solaris (at least on my solaris), they end up in /opt/csw/bin.
So I'm gonna try a symbolic link as follows:
ln -s /opt/csw/bin/pdftotext <MOODLE>/lib/opt/csw/bin
(and the same for antiword)
I'll post back to let the community know if it worked for me. Now, my issue regarding the original repsonse -- that the libs have to be in the Moodle lib/ dir, for reasons of future packaging -- is that this logic seems to assume a uniform platform base. Meaning, what about if I'm installing on Mac, Windows, Solaris, etc.? The binary xpdf package, for example, can't be included with any Moodle codebase in those cases. Unless I'm missing something.
Okay, thanks. Helpful thread. Seeyalater...
Well, the actual version of the global search should now not force you with such trick. There is an additional option in the search block central config that let you avoid prepending the Moodle root to the executable path construction. This should let libs be anywhere else in the server.
About, packaging. I searched for real opensource and free converters, so matching with the GPL extensibility of Moodle in general. Some converters where found with two distros, as generic Linux and generic Windows binaries. Other converters might have themselves more complex distributions.
My opinion is that we should (reserved to GPL compatible code) integrate libs for majoritary cases, and point the eventual availability of other distros. Specially for small libs cause no problem adding them to lib, pursuant all developpers find it valuable, but there would be some resistance (understandable) to integrate "couple of megas" distributions, as they still are "external code".
Could you echo me something Koen ?
in /search/documents/physical_doc.php § 24
you add :
echo "{$CFG->dirroot}/{$CFG->block_search_word_to_text_cmd}";
and tell me what goes out ?
Thanks. I'll try to help you as far I can foresee your system config.
I solved my problem. I copied the files, by Debian installer installed, to the right place in the moodle/lib folder, replacing the ones I've downloaded using CVS from /contrib and now it works. Very weird do, because both are version 3.02.
On my laptop, running ubuntu, it worked immediately with the files in CVS. The server has a 64 bit processor. I wonder if there could be a problem ...
I had to raise my allocated memory size for PHP a lot - up to 150M to be able to go through the test indexer (keep on getting allowed memorysize of 96M exceeded).
Now the index is running (takes indeed looong). The apache errorlog gets quite some info. Please find in attachment the tail -n 100 of the log. Should I worry about that?
Thanks for the log.
It seems Windows-like filenames with () within breaks the Linux command line. I may add something to protect that and allow transparently indexing all files whatever the name is.
Valery.
This is a big security hole. Using shell_exec() (or any other shell invoking functions) without cleaning parameters first with escapeshellcmd()/escapeshellarg opens up for shell injection attacks.
In this particular case, $file should be cleaned with escapeshellarg() before using it:
$file = escapeshellarg($CFG->dataroot.'/'.$resource->course.'/'.$resource->reference;);
$text_converter_cmd = "{$CFG->dirroot}/{$CFG->block_search_word_to_text_cmd} $file";
[...]
Saludos. Iñaki.
Gracias por el aviso Iñaki !!
Fixed in CVS for HEAD, MOODLE_18_STABLE, MOODLE_19_STABLE.
Peace.
I hope you didn't check in the code I wrote in my previous post. I have just seen I didn't remove an extra ';' before the closing parenthesis in the escapeshellarg() call
En tout cas, c'est un plaisir de collaborer à améliorer le code
Saludos. Iñaki.
I saw it !
Je l'avais vu !!
Lo habia visto !!!
Is that an international project or not !!!
Shouldn't there be something about this in http://docs.moodle.org/en/Development:Coding#Security_issues_.28and_handling_form_and_URL_data.29?
RLE
Yes, I'll add it there. Thanks
Saludos. Iñaki.
I'm using lynx on the server to trigger the script, so no network problems can distribute the process.
Something good came out of this: I fixed the broken TeX filter the same way: installing mimetex with the Debian installer and replacing mimetex.linux, distributed with Moodle with the one from the linux distribution. I'm beginning to wonder whether it is such a good idea to distribute these binaries together with Moodle
This is a real question. The positive argument is : keep Moodle simple to deploy, without having tens of packs to fetch and install in the correct order. Distributing a suitable distribution of additional libraries should work most of the time, once sufficiant people sent feedback to evaluate stability. The developper will often rely on what a specific distribution level offers as an API. If you get the last updated version, API might have changed and the integration could suffer of this.
About memory : this is a real problem I don't know actually yet how to resolve. First indexing might need huge amount of resources, but it will need it once. I tried to see in Michael code how to optimize and free some resources. I do not have yet sufficiant memory inspecting tools to see where is the mess.
statistics
datadirectory | /var/moodledata/search |
filesinindexdirectory | 9 |
totalsize | 7.8MB |
createdon | Tue, 04 Dec 2007 16:46:16 +0100 |
solutions
runindexertest | tests/index.php |
runindexer | indexersplash.php |
databasestate
database | mdl_block_search_documents |
documentsinindex | 6673 |
deletionsinindex | -47367 |
documentsindatabase | 54040 |
documentsfor 'Chats' | 98 |
documentsfor 'Databases' | 20 |
documentsfor 'Forums' | 44254 |
documentsfor 'Glossaries' | 1462 |
documentsfor 'Resources' | 8074 |
documentsfor 'modulenameplural' | 0 |
documentsfor 'Wikis' | 0 |
All the keys are in the search/lang/en_utf8/search.php that should be copied within the standard lang dir.
I saw that the lang files had been updated in the distributions in CVS.
I am running it in my test environment, and consistently get the "execution failed" error.
I have tried using the moodle root option, and disabling it.
I want to see if anyone else has gotten this resolved in a similar environment.
Regards,
John
Version: 1.9.2
Environment:
PHP Version 5.2.1
System Windows NT MEDIA 5.2 build 3790
Build Date Feb 7 2007 23:10:31
Configure Command cscript /nologo configure.js "--enable-snapshot-build" "--with-gd=shared"
Server API ISAPI
Virtual Directory Support enabled