Link crawler robot

Admin tools ::: tool_crawler
Maintained by Picture of Catalyst ITCatalyst IT, Picture of Brendan HeywoodBrendan Heywood, Picture of Daniel Thee RopertoDaniel Thee Roperto
An admin tool robot crawler which scans your moodle for broken, large or slow links.
Latest release:
167 sites
32 fans
Current versions available: 1

It is an admin tool with a moodle cron task, but it reaches into your moodle via curl effectively from outside moodle, and scrapes each page, parses it and follows links. By using this architecture it will only find broken links that actually matter to students. Because it comes in from outside it needs to authenticate and has a dependancy on the moodle-auth_basic plugin. It is recommended that you setup a dedicated 'robot' user who has readonly access to all the site pages you wish to crawl. You should give the robot similar capabilites that real students will have.


Screenshot #0
Screenshot #1
Screenshot #2
Screenshot #3


Picture of Catalyst IT
Catalyst IT (Lead maintainer)
Picture of Daniel Thee Roperto
Daniel Thee Roperto: Coder at Catalyst IT Australia
Please login to view contributors details and/or to contact them

Comments RSS

Show comments
  • Picture of Brendan Heywood
    Tue, Jun 13, 2017, 7:40 AM
    hi Mary-Anne,

    There should be no privacy issues, as the crawler results are only visible to admin and course managers by default. Also what the robot can see is completely configurable via moodle's capabilities so if anything sensitive is being scraped you can turn it off. Also the robot is only interested in links, it ignores all other content. So the only real privacy issue could be visibility of an external link, but either way all content that is scraped is visible to normal course admins / students anyway.

    The main security issue is making sure the robots credentials aren't leaked as then someone could gain access to whatever the robot can see. But this is exactly the same as managing credentials for any other user. The robot user should not have any write permissions to anything at all. If you need to roll the password for the robot this is trivial.

    If you, or anyone, identify any privacy or security issues I've missed please ensure they are logged here:


  • Picture of Gero Lückemeyer
    Tue, Sep 12, 2017, 5:36 PM
    Hi Brendan,

    great idea and something I assume many people were waitiing for. Thanks for providing the plugin!

    Your solution technically seems to work from the comments here, but I think at least for some it answers the wrong (business) question. In our university, responsibility for maintaining course content such as external links does not lie with the (poor) administrator or cron job running the plugin, but with the trainer of the respective course.

    For full business utility, the plugin should sort the dead links by trainer and course and send an email to every trainer with his or her broken links, ordered by course and - if possible - givie the title in the course along with the link target.

    I assume that due to the limited capabilities of individual Moodle plugins the email functionality would require an additional plugin. Anyway, the individual email per trainer solves any data privacy issues that might arise from sending the full report to all trainers - even in the very critical Germany.

    Please let me know what you think of this idea.

    Thank you very much, best regads,

  • Picture of Brendan Heywood
    Wed, Sep 13, 2017, 8:08 AM
    hi Gero,

    This plugin currently provides reports at both the site level and also add filtered reports at the course level so that each course coordinator can see just what affects their courses. See Course > Course administration > Reports > Link crawler robot > (4 new reports).

    If they find issues they can fix it on the spot, and then from those course level reports flag a url for recrawling and these get a higher priority than the background en-masse crawling. We could very easily add an email report within this plugin if we wanted to - but for the client that sponsored this plugin that was actually undesirable. Broadly I would like to evolve this plugin to have a similar set of features and reports to the Google webmaster tool aka Search console.

    I'd happy to implement some sort of email based reporting, as long as it was opt in and was configurable at the site level. If you want to sponsor this new feature please contact us:

  • Picture of Rajas Joshi
    Sun, Dec 10, 2017, 3:23 PM
    will the latest version of this plugin work with moodle 3.3.2?
  • Picture of Brendan Heywood
    Mon, Dec 11, 2017, 6:45 AM
    This plugin requires the basic auth plugin which in turn needs to updated to support the new auth settings api. This isn't much work if you wanted to sponsor it. I have another client interested in this so support should be added in a couple months if you are happy to wait.
  • Ricardo Caiado
    Sat, Mar 17, 2018, 11:26 PM

    Any update to Moodle 3.4?

  • Picture of Mathew Gancarz
    Tue, Jul 17, 2018, 5:19 AM
    Or 3.5?
  • Picture of Steve Pollock
    Sat, Nov 10, 2018, 9:14 AM
    Should this be working in 3.3 or 3.4 now? I see you were doing some work on the basic auth so wanted to check in.

    Couple of other questions;
    1. Is a valid cert required? Your curl command fails due to self-signed cert on my dev machine. can override with -k
    2. We are running OKTA/SAML as the default login, again your curl test command returns the OKTA login rather than going to the page. The user is set for basic auth but our default is SAML.

  • Ricardo Caiado
    Tue, Jan 8, 2019, 9:16 AM

    Any updates to M3.6?

  • Ben Haensel
    Thu, Jan 10, 2019, 11:54 PM
    It would be great to see if this could be updated for 3.6! I can see that the BB community is advocated to get this added to their plugin set as well: - Ben, BlueSky Online School, MN
  • Picture of dhirendra singh
    Tue, Sep 10, 2019, 8:59 PM
    Any one help me about why Progress ETA is more than 5 year from crawl start date in robot status tab.

    Progress 1.53% ETA in Monday, 21 April 2025, 4:55 AM | Reset Progress
  • Picture of Adam Gogo
    Tue, Feb 4, 2020, 3:51 AM

    I've installed the plugin to my Moodle 3.4.5 environment that is using a SQL Server database and i'm getting errors from the plugin. When looking into the code, I found that there is SQL specific to MySQL and is not supported by SQL Server. Also I found that the table mdl_tool_crawler_url has a field called "external" which is a keyword in SQL Server.

    Has anyone got this plugin to work with a SQL Server DB? Did you run into the same issues I have?

  • Picture of Muhammad Sajjad Hussain Abid
    Thu, Mar 26, 2020, 9:26 PM
    Hello Friends,

    Any update to Moodle latest version, please?

    Sajjad Hussain
  • Picture of Phineas Gomez
    Sat, Jul 11, 2020, 11:29 PM
    This plugin works with the URL located on quiz feedback?
    I'm testing it but not working sad
  • Picture of Greg Myles
    Tue, Jan 5, 2021, 11:52 PM
    Would I be right in thinking that the crawler is unable to check H5P content?
1 2
Please login to post comments