Here are some news about full text search capabilities in Moodle.
I started review of the Global Search block and search experimental features based on Lucene last tuesday. Results seems being pretty.
The revamp of that part of code features from now :
- full internationalization of the search engine (front side, some indexing traces still native english - do we need translating ?)
- major bug fixed (SQL request level was not genuine - using too deep library entries)
- most XHTML-strict compliance bugs fixed
- new : complete pluggable indexing enhancements for uploaded files :
- indexes uploded file in moodledata
- allows multiple formats handling though binded text extractors
- actually handles :
- PPT : Microsoft Powerpoint (tested with 97 compatibility files)
- XML, HTML and other true text formats
- DOC : Microsoft Word thru doctottext from SILVERCODERS (opensource)
- PDF : Adobe's well known thru pdftotext (xpdf opensource converter)
- for developpers : easy addition of new formats when knowing how to extract full-text representation.
All this works fine on a Moodle 1.8.1, windows XP Pro server with Apache 2.0.59 and Php 5.2.3 distributions.
Still to do :
- As Martin said : considering roles implication in the search engine.
- Still earing Martin : security checkouts of the whole subsystem
- Some way of better packaging the whole set (actually, does not install completely using Moodle autoinstalling mechanisms) => integrate the whole set as search block dependancies could be nice.
- Check how to deliver it (little more than 10Mbytes including extra libraries)
- Check and debug automated maintenance of indexes (I will perform this in a while)
When make it available ?
As pre-experimental next wenesday (I hope) after deeper testing and install on other site for checking stability. (I'll post a new post here).
As experimental, I need discuss with Martin to check that work out.
Regards.