How do I make this function index itself or whatever seems to be missing?
You're not so up to date !!
the global search works like this. I confess it is not a real explicit way of getting the content indexed. I promiss I will fix that writting doc in Moodledocs and the reviewed Readme.
1. Get in administrator mode
2. Edit sitewide parameters of the search block. You should NOT have to change anything, unless if the extra libs for converting files to text have been deployed in an unusual place.
Note that for indexing physical files, you need to get additional converters that are in the CVS at contrib/patches/global_search_libraires. I collected these converters for Windows and Linux support. Some of them may have addtional support for other OS distributions.
You may activate here for indexing physical files or not. Eventually change some path setup if needed.
3. Go to the block, make a blank search
4. Browse to the "statistics". Being administrator, you'll have additional links to perform the first-time-indexing. Once done, the cron should update the indexes with deleted, updated and added keys.
Beware : if you have many document, this process might be heavy and time spending. Try at night if possible.
5. The indexer will report you what has been indexed for each supported module.
6. Try a search
7. It should be fine.
Server Time: Thu, 29 Nov 2007 18:27:34 -0400 Testing global search capabilities: Success: PHP 5.0.0 or later is installed (5.2.0). Clicking on the Indexersplash script gives me this: Server Time: Thu, 29 Nov 2007 18:28:19 -0400 Warning: Indexing was not successfully completed last time, restarting. Using C:\http\vhosts\brightwhite.ca\private\moodledata/search as data directory. Database error. Please check settings/files.
I added the documentation page :
Based on my previous post. I will update it to make as clear as possible.
Block site-wide configuration is reached using the Administration->Modules->Block menu and clicking the "parameters" link near the Global Search Block entry.
If the database seems not being set, try uninstall and install the block back. Then run the indexer once more.
Well, both search dir and blocks/search must be updated to the last published version in CVS.
The /search dir contains the search engine itself.
lib.php in there should show the updated version :
* @version 2007110400
I did'nt did it so clear for the search block, as I did only fix some internationalisation code, and just added some parms to the global_config. That's right, I should also have changed more things in file header to make that upgrade more traceable.
Version of the block should show :
$this->version = 2007062900;
at the top of the block_search.php file.
Are you OK with that ?
the Third thing to get are that library and converter set, but this is only suitable for physiscal documents (that may be most part of indexable stuff !!).
Actually is the version OK in the HEAD of the CVS. I also commited and merged in the MOODLE_18_STABLE.
Obviously, and according to a last message of Eloy Lafuente, that tracks use of the CVS, some last files were not correctly merged in the MOODLE_19_STABLE, this could maybe explain the issue.
Tell me informed. I also maintain a whole pack of the search engine distribution in :
Just let me a couple of minutes to turn in english most content, and check the distribution...
I appreciate the work of stabilization you are doing around Global Search.
In fact I checked Zend version on my copies of 18_STABLE and 19_STABLE and HEAD, being exactly the same (as far I could check) recently UPDATED with clean copy retrieval.
I know I experimented some issues with php5.2 when starting working on 1.8.1 (as I tried to add eaccelerator). I didn't come back to this matter since.
I still have a denied access on MOODLE_19_STABLE for the search engine. I will ask MD if he could change something there.
Had mistaken Root files in my CVS markers. Was pointing to old sourceforge server !!
Hi kind helpers.
After carefully following this thread and installation instructions I only get this message when running the indexspasher...
Warning: Indexing was not successfully completed last time, restarting.
Using /home/courses.kpublic.net/web/moodledata/search as data directory.
I ran my cron job
I checked moodledata/search dir and made sure it was writeable by webserver.
I installed the lib dir with antiword and xpdf for linux.
I checked the database table: block_search_documents (it is empty)
When I ran search/tests/index.php I got...
Testing global search capabilities:
Checking activity modules:
9 modules to search in / 19 modules found.
0 blocks to search in / 33 blocks found.
1 additional to search in.
Success : 'assignment' has nothing to index.
Success : 'chat' has nothing to index.
Success: 'data' module seems to be ready for indexing.
Found 2 discussions to analyse in forum Social forum
Success: 'forum' module seems to be ready for indexing.
Success : 'glossary' has nothing to index.
finished label 2
Success: 'label' module seems to be ready for indexing.
Success : 'lesson' has nothing to index.
finished Web page
finished Simple Text
finished New page
Success: 'resource' module seems to be ready for indexing.
Success : 'wiki' has nothing to index.
Success: 'user' module seems to be ready for indexing.
Finished checking activity modules.
It's a rather new install. That's all I've got for now. Further trouble shooting steps would be appreciated. Thanks for your time!
Laren, global search is not bringing up results from lesson content.
it is finding results from other ares, like activities.
Did you manage to fix the global indexer, I would like to make it work. It is causeing me a lot of bother and I would appreciate any advise if you managed to get it to work. Skype ray.mizzi1
thanks in advance.
Doing a quick review of this... Mark, Valery, great to see some action in describing the setup. What are the roadblocks to a generally usable global search? Is it even feasible to provide a global search to non-admins?
The stumbling blocks I see (that I'm unsure about):
"Is it even feasible to provide a global search to non-admins?"
in which way is it not ?
If I understand you, you would like to ensure users that are Moodle admins, but not physical admins of the server to install and run the global search.
"Unbound memory use"
I guess this is because for indexing purpose, PHP needs to process the entire document content, or at least, the text converted version. We get this text conversion through both ways : internal converter (XML, HTML) or externally invoked converters (PDF, DOC).
In both ways the PHP itself MUST process the text content and cut it into pieces. The problem is that the amount of memory needed depends on the document size, not the search engine proper code.
I tried to cleanup the most part of uneeded memory structures in the external code (I mean our code is external, compared to the ZF code which I call internal). The issue should be really problematic on "primo-indexation", when a huge existing set of documents is stored and the index is empty. We would imagine having a command line, server side tool for indexing that document set outside the normal functionning of Moodle, but that would avoid non-admins setting up the search engine.
"How do we check for access rights?"
Do you mean, access rights on indexed entries or access rights upon external text converters ?
On index entries, access rights are handled by the document callbacks, so they are part of each module search API implementation. The implementation should find its own way to reproduce conditions of availability of the target document. In some way, a well designed document search API should know how each module behaves and encode the appropriate checkings (I'm not sure i made it all good !!).
Let think deeper on it !! Martin. It's a good way to find solutions. Thanks.
I am thinking of moodle admins (not sysadmins, those can use grep! )) vs normal Moodle users. What I suspect may not be feasible is to run the searches so that they are scalable and fast.
For a modern text-based indexing and searching system there is no good reason (that I know of!) to use unbounded memory. Documents are being processed linearly -- so we don't need the whole document in memory, ever. I am sure we can get the Moodle side of things to be memory-smart rather than memory-bound. But what worries me is the design of ZF - is ZF memory bound in itself? If it is, then it'll be hard to fix...
access rights are handled by the document callbacks, so they are part of each module search API implementation
Does this mean that if we find 10,000 documents we'll call 10,000 callbacks? Ouch!
We will need to steal some scalable techniques to do the checks in-place. At least let each module do a bit of setup beforehand, so they get a chance to read-in the needed data in one go. (Here, having OOP an module API would help a bit as it would give us a 'natural' persistence model, but we can work around it).
Edit: we can probably reuse the programming techniques we have in accesslib. We used to have a ton of DB traffic, and now we read some data up-front with some smart SQL, and don't touch the DB at all past initialisation. That means that no matter the number of calls to has_capabilities(), we run a constant number of DB queries.
Thinking about this... don't think we can count on that optimisation:
So it is back to what I was saying earlier - we must find a scalable way to do this. Simple callbacks won't work.
This sounds me pretty receivable.
This solution is affordable for small organisation systems who will not have thousand of thousand of entries, but I agree it is not scalable.
I tried to think about storing sufficient data with the indexing record to make the initial selection process to most of the filtering work, but it was also a bit hard to go deeper this way. The indexer does work in cron context, that is, unaware of the situation of the resource author is in. So was the initial logic of getting access resolved by the requirer.
Maybe could we have a way to better segment the initial pick out from the Zend engine, so that most part of macro-context rules should be applied ? This is another research way : preprocessing retro indexes on who is known acceeding to what, hum ! big waste of data would'nt it be ?
Caching access-query results for a user ? => think about remanance and release timeouts... ???
There is another fallback :
a new module will implement some local access strategy. We would prefer developpers do rely only on capabilities, and do not try to implement other acess control strategy, but we can't rely on.
... let continue arguing ...
Server Time: Thu, 07 Feb 2008 14:13:55 -0400 Warning: Indexing was not successfully completed last time, restarting. Using C:\http\vhosts\brightwhite.ca\private\moodledata/search as data directory. Database error. Please check settings/files.
This was obviously not the right block. Search block parameters shoud show :
the search version should show 2007081100 in blocs/search/block_search.php, it should be a README.txt in the distribution so would be a lang dir with en,fr and nl packs.
Note that this code is now "official code of Moodle" and is up to date in Moodle CVS from 1.8 (and should be over !)
actually, the Global Search do only reindexes new resources when the moodle cron job is launched. This depends on the cron being correctly setup on your Moodle server, and which period was choosen for that cron.
In case it is OK, we should check more precisely on your platform if the cron does index new resources or not.
You may check this by triggering manually the cron, addressing <%%yourMoodleWWWRoot%%>/admin/cron.php in a browser, just after having added new files.
You should see the cron report with the Global Search tryouts to index new stuff inside.
This is a first approach to check if everything seems being OK.
Thanks for your reply, Valery.
It seeems that cron does not index new resources... This is the report from cron.php after having uploaded a pdf file.
Starting activity modules Processing module function assignment_cron ...
Processing module function forum_cron ...
Starting digest processing... Cleaned old digest records done.
Processing module function journal_cron ...done.
Processing module function workshop_cron ...done.
Finished activity modules Starting blocks Processing cron function
Starting clean-up of removed records...
Index size before: 28
Checking chat module for deletions. No types to delete. Finished chat.
Checking data module for deletions. Finished data.
Checking forum module for deletions. Finished forum.
Checking glossary module for deletions. Finished glossary.
Checking lesson module for deletions. Finished lesson.
Checking resource module for deletions. Finished resource.
Checking wiki module for deletions. Finished wiki.
Finished 0 removals.
Index size after: 28
<pre>Starting index update (updates)...
Checking chat module for updates. No types to update. Finished chat.
Checking data module for updates. Finished data.
Checking forum module for updates. Finished forum.
Checking glossary module for updates. Finished glossary.
Checking lesson module for updates. Finished lesson.
Checking resource module for updates. Finished resource.
Checking wiki module for updates. Finished wiki.
Finished 0 updates.</pre>
<pre>Starting index update (additions)...
Index size before: 28
Checking chat module for additions. No types to add. Finished chat.
Checking data module for additions. Finished data.
Checking forum module for additions. Finished forum.
Checking glossary module for additions. Finished glossary.
Checking lesson module for additions. Finished lesson.
Checking resource module for additions. Finished resource.
Checking wiki module for additions. Finished wiki.
Index size after: 28
</pre> ------------ done done.
Updating languages cache
Removing expired enrolments ...
0 to delete none found Running backups if required...
Checking backup status...OK
Getting admin info
Deleting old data
Skipping deleted courses
ple Next execution: Sunday, 6 April 2008, 12:00 am
eleni Next execution: Sunday, 6 April 2008, 12:00 am
eleni2 Next execution: Sunday, 6 April 2008, 12:00 am
Next execution: Sunday, 6 April 2008, 12:00 am
Backup tasks finished.
Running rssfeeds if required... Generating rssfeeds...
assignment: ...NOT SUPPORTED (file)
chat: ...NOT SUPPORTED (file)
choice: ...NOT SUPPORTED (file)
data: generating ...OK
forum: generating ...OK
glossary: generating ...OK
hotpot: ...NOT SUPPORTED (file)
journal: ...NOT SUPPORTED (file)
label: ...NOT SUPPORTED (file)
lams: ...NOT SUPPORTED (file)
lesson: ...NOT SUPPORTED (file)
quiz: ...NOT SUPPORTED (file)
resource: ...NOT SUPPORTED (file)
scorm: ...NOT SUPPORTED (file)
survey: ...NOT SUPPORTED (file)
wiki: ...NOT SUPPORTED (file)
workshop: ...NOT SUPPORTED (file)
Ending rssfeeds......OK Rssfeeds finished
Running auth crons if required...
Cron script completed correctly Execution took 6.259061 seconds
Well, here is a first good news : cron is running and indexer updater is called and checks document sources !! I will seek the code deeper to see if we could trap your desease.
Keep you informed...
... there is a VERY KEY query that could help me a lot :
in /search/add.php at line §95 would you mind adding th following line for a test ?
if ($mod->name == "resource") echo $query;
just before the get_records... statement. Try indexing a new file and post me the exact SQL it was showing there.
(this is the key queries that look for new registered stuff in the Moodle database).
Note : if no SQL is shown is still information ! Cut off the test line (or comment it) after testing !!
SELECT id, id as docid FROM mdl_resource WHERE id NOT IN ('7','8','9','11','12','13','14','15', '21','23','24','25','27','28','31','32','33','34','38', '39','40','41','42','43','44','45') and timemodified > 1206960814 AND ( (alltext != '' AND alltext != ' ' AND alltext != ' ' AND TYPE != 'file') OR TYPE = 'file' )I don't really know what happened, maybe restarting my pc made some difference...Anyway, it works and that's what matters
SELECT id, docid FROM mdl_block_search_documents WHERE doctype = 'resource' AND itemtype = 'any' AND docid not in ('7','8','9','10','11','12','13','14', '15','16','18','21','23','24','25','27','28','29','30', '31','32','33','34','35','37','38','42','43','45')The itemtype should be 'file'. Is that only in my database?
query: +docid:39 +doctype:resource +itemtype:file parsed query:+(<EmptyQuery>) +(doctype:resource) +(itemtype:file)
What a so nice and efficient issue review !!
Very valuable for me. I will track back that resource type "wildcard" story ASAP. I thought it was achieved but... never say it's finished !!
I will reconsider the 'any' type and translate it correctly in query.
query: +docid:39 +doctype:resource +itemtype:file
parsed query:+(<EmptyQuery>) +(doctype:resource) +(itemtype:file)
The query is not parsed correctly, therefore the file i have deleted
(docid:39) can not be found and removed from the database.