Search engines: Elastic

search_elastic
Maintained by Picture of Matt Porritt Matt Porritt, Picture of Catalyst IT Catalyst IT
This plugin allows Moodle to use Elasticsearch as the search engine for Moodle's Global Search. The following features are provided by this plugin: Multiple versions of Elasticsearch, File indexing, Request signing, compatible with Amazon Web Services (AWS), Respects Moodle Proxy settings, Image recognition and webservices.
45 sites
160 downloads
10 fans

Moodle Global Search - Elasticsearch Backend

This plugin allows Moodle to use Elasticsearch as the search engine for Moodle's Global Search.

The following features are provided by this plugin:

  • Multiple versions of Elasticsearch
  • File indexing
  • Request signing, compatible with Amazon Web Services (AWS)
  • Respects Moodle Proxy settings
  • Image recognition and indexing
  • Webservices

Full plugin documentation can be found: https://docs.moodle.org/33/en/Elasticsearch

Supported Moodle Versions

This plugin currently supports Moodle:

  • 3.1
  • 3.2
  • 3.3

Installation

NOTE: Complete all of these steps before trying to enable the Global Search functionality in Moodle.

  1. Get the code and copy/ install it to: <moodledir>/search/engine/elastic
  2. This plugin also depends on local_aws get the code from https://github.com/catalyst/moodle-local_aws and copy/ install it into <moodledir>/local/aws
  3. Run the upgrade: sudo -u www-data php admin/cli/upgrade Note: the user may be different to www-data on your system.
  4. Set up the plugin in Site administration > Plugins > Search > Manage global search by selecting elastic as the search engine.
  5. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  6. Set hostname and port of your Elasticsearch server
  7. To create the index and populate Elasticsearch with your site's data, run this CLI script. sudo -u www-data php search/cli/indexer.php --force
  8. Enable Global search in Site administration > Advanced features

Elasticsearch Version Support

Currently this plugin is tested to work against the following versions of Elasticsearch:

  • 2.3.4
  • 2.4.4
  • 5.1.2
  • 5.5.0

Elasticsearch Setup

The following is the bare minimum to get Elasticsearch working in a Debian/Ubuntu Operating System environment. Consult the Elasticsearch Documention for in depth instructions, or for details on how to install on other operating systems.

NOTE: The instructions below should only be used for test and dev purposes. Don't do this in production.

Elasticsearch requires Java as a prerequisite, to install Java:


sudo apt-get install default-jre default-jdk

Once Java is installed, the following commands will install and start Elasticsearch.


wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.2.deb
sudo dpkg -i elasticsearch-5.1.2.deb
sudo update-rc.d elasticsearch defaults
sudo service elasticsearch start

A quick test can be performed by running the following from the command line.


curl -X GET 'http://localhost:9200'

The output should look something like:


{
  "name" : "1QHLiux",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "mLRqIsnVRrGdgg2OfHWNrg",
  "version" : {
    "number" : "5.1.2",
    "build_hash" : "c8c4c16",
    "build_date" : "2017-01-11T20:18:39.146Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}


File Indexing Support

This plugin uses Apache Tika for file indexing support. Tika parses files, extracts the text, and return it via a REST API.

Tika Setup

Seting up a Tika test service is straight forward. In most cases on a Linux environment, you can simply download the Java JAR then run the service.


wget http://apache.mirror.amaze.com.au/tika/tika-server-1.16.jar
java -jar tika-server-1.16.jar

This will start Tika on the host. By default the Tika service is available on: http://localhost:9998

Enabling File indexing support in Moodle

Once a Tika service is available the Elasticsearch plugin in Moodle needs to be configured for file indexing support.
Assuming you have already followed the basic installation steps, to enable file indexing support:

  1. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  2. Select the Enable file indexing checkbox.
  3. Set Tika hostname and Tika port of your Tika service. If you followed the basic Tika setup instructions the defaults should not need changing.
  4. Click the Save Changes button.

What is Tika

From the Apache Tika website:

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more. You can find the latest release on the download page. Please see the Getting Started page for more information on how to start using Tika.

Why use Tika as a standalone service?

It is common to see Elasticsearch implementations using an Elasticsearch file indexing plugin rather than a standalone service. Current Elasticsearch plugins are a wrapper arround Tika. (The Solr search engine also uses Tika).
Using Tika as a standalone service has the following advantages:

  • Can support file indexing for Elasticsearch setups that don't support file indexing plugins such as AWS.
  • No need to chagne setup or plugins based on Elasticsearch version.
  • You can share one Tika service across multiple Elasticsearch clusters.
  • Can run Tika on dedicated infrastructure that is not part of your search nodes.
  • Files stored using native Elasticsearch functionality are stored as separate records inside Elasticsearch, these are separate to the rest of the data stored relating to that file.
  • Ingesting files using native Elasticsearch functionality is very inefficient. Files are stored in the Elasticsearch internal database as base64 encoded strings. Base64 on average takes up 30% more space than the original binary. This is in addition to the content extracted from the file which is also stored in Elasticsearch.
  • The Elasticsearch documentation also states:
    Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.

Image Recognition and Indexing

This plugin can use the Amazon Web Services (AWS) [Rekognition service(https://aws.amazon.com/rekognition/) to identify the contents of images. The identified content is then indexed by Elasticsearch and can be searched for in Moodle (cool huh?).

NOTE: Indexing of files by Moodle's core Global Search is currently limited to only indexing files from a couple of places. Tracker issue MDL-59459 has been raised to increase the coverage of the files indexed by Global Search.

Currently the best resource to use to test image search functionality it so add an image via the Moodle course file resource.

Enabling image recognition and indexing support in Moodle

Once you have setup Elasticsearch in AWS Moodle needs to be configured for Image Recognition.
Assuming you have already followed the basic installation steps and the file indexing steps, to enable Image Recognition:

  1. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  2. Select the Enable image signing checkbox.
  3. Set Key ID, Secret Key and Region of your AWS credentials and Rekognition region.
  4. Click the Save Changes button.

NOTE: You will need a set of AWS API keys for an AWS IAM user with full Rekognition permissions. Setting this up is beyond the scope of this README. for further information see the AWS Documentation.


Request Signing

Amazon Web Services (AWS) provide Elasticsearch as a managed service. This makes it easy to provision and manage and Elasticsearch cluster.
One of the ways you can secure access to your data in Elasticsearch when using AWS is to use request signing. Request signing allows only valid signed requests to be accepted by the Elasticsearch endpoint. Requests that are unsigned are not authorised to access the endpoint.

Enabling Request Signing support in Moodle

Once you have setup Elasticsearch in AWS Moodle needs to be configured for Request Signing.
Assuming you have already followed the basic installation steps, to enable Request Signing:

  1. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  2. Select the Enable request signing checkbox.
  3. Set Key ID, Secret Key and Region of your AWS credentials and Elasticsearch region.
  4. Click the Save Changes button.

Test Setup

In order to run the PHP Unit tests for this plugin you need to setup and configure an Elasticsearch instance as will as supply the instance details to Moodle. You need to define:

  • Hostname: the name URL of the host of your Elasticsearch Instance
  • Port: The TCP port the host is listening on
  • Index: The name of the index to use during tests. NOTE: Make sure this is different from your production index!

Setup via config.php

To define the required variables in via your Moodle configuration file, add the following to config.php:


define('TEST_SEARCH_ELASTIC_HOSTNAME', 'http://127.0.0.1');
define('TEST_SEARCH_ELASTIC_PORT', 9200);
define('TEST_SEARCH_ELASTIC_INDEX', 'moodle_test_2');

Setup via Environment variables

The required Elasticserach instance configuration variables can also be provided as environment variables. To do this at the Linux command line:


export TEST_SEARCH_ELASTIC_HOSTNAME=http://127.0.0.1; export TEST_SEARCH_ELASTIC_PORT=9200; export TEST_SEARCH_ELASTIC_INDEX=moodle_test


Screenshots

Screenshot #0

Contributors

Picture of Matt Porritt
Matt Porritt (Lead maintainer)
Please login to view contributors details and/or to contact them

Comments RSS

Show comments
  • Picture of Plugins bot
    Mon, 13 Feb 2017, 12:00 PM
    Approval issue created: CONTRIB-6748
  • Picture of Grzegorz Ziółek
    Fri, 19 May 2017, 6:12 PM
    What are the benefits(or pitfalls) over using solr plugin?
  • Picture of Matt Porritt
    Mon, 22 May 2017, 6:06 AM
    @Grzegorz Ziółek
    In terms of user experience and searching for content In Moodle both plugins should behave exactly the same. There is an edge case where the Elasticsearch plugin has better behaviour with paging when there are many pages of search results.

    The main reason Catalyst IT made this plugin is that Elasticsearch better suits our infrastructure setup. For example on our Amazon Web Services (AWS) hosting infrastructure we can take advantage of the Elasticsearch managed service AWS provides. This takes away a lot of the burden of managing a search engine cluster. Personally I find Elasticsearch easier to setup and manage than Solr. Whether this benefits you will depend on your setup.

    If you have any other questions please let me know.
  • Picture of Jeff White
    Wed, 26 Jul 2017, 3:31 AM
    Tested this tool out and it is much easier to install that solr. The service seemed to index a very large moodle instance faster than solr did.
  • Picture of Steve Pollock
    Fri, 4 Aug 2017, 8:45 AM
    I have this working with the latest (AWS) ES 5.3 as of today on Moodle 3.2.2 -- just FYI. Thank-you for sharing the work!

    I noticed I had to run the indexer by hand in the steps above, just want to verify that it keep everything indexed going forward or do I need to build a job to run the indexer every so often?

    thanks again,
    -Steve
  • Picture of Matt Porritt
    Fri, 4 Aug 2017, 9:15 AM
    Hi Steve,
    Core global search has a core scheduled task that runs to keep the index up to date. So you shouldn't need to set anything else up. You can check the status of the scheduled tasks here: youmoodlesitedomain/admin/tool/task/scheduledtasks.php Also you can get more detailed information about the index here: yourmoodlesitedomain/admin/searchareas.php

    Hope this helps
  • Picture of Steve Pollock
    Sat, 19 Aug 2017, 5:12 AM
    Perfect Matt, this has been working very well, thank-you!!

    Now, I have a question about extending this a bit. I am able to push a lot of other data into ES with logstash and others, but of course the data is placed in different indices: logstash*, twitter, email etc.. Would it be possible to have the site index into "moodle" as is today, but instead search on "_all" indices?

    I don't know if that's really difficult or an easy change. It would great if we could provide a central search right out moodle.

    Thanks
    -Steve
  • Picture of Matt Porritt
    Sun, 20 Aug 2017, 2:18 PM
    Hi Steve,
    Searching across multiple indices with Elasticsearch is straight forward, in that Elasticsearch supports it. Integrating with Moodle would be harder. The main hurdle would be access control and display of non Moodle results in Moodle.
    If you want to talk about this more, please feel free to reach out to me directly.

    Matt P
Please login to post comments