Admin tools: Link crawler robot

tool_crawler
Maintained by Picture of Catalyst IT Catalyst IT, Picture of Brendan Heywood Brendan Heywood, Picture of Daniel Thee Roperto Daniel Thee Roperto
An admin tool robot crawler which scans your moodle for broken, large or slow links.
118 sites
69 downloads
14 fans

It is an admin tool with a moodle cron task, but it reaches into your moodle via curl effectively from outside moodle, and scrapes each page, parses it and follows links. By using this architecture it will only find broken links that actually matter to students. Because it comes in from outside it needs to authenticate and has a dependancy on the moodle-auth_basic plugin. It is recommended that you setup a dedicated 'robot' user who has readonly access to all the site pages you wish to crawl. You should give the robot similar capabilites that real students will have.

Screenshots

Screenshot #0
Screenshot #1
Screenshot #2
Screenshot #3

Contributors

Picture of Catalyst IT
Catalyst IT (Lead maintainer)
Picture of Daniel Thee Roperto
Daniel Thee Roperto: Coder at Catalyst IT Australia
Please login to view contributors details and/or to contact them

Comments RSS

Show comments
  • Picture of Dan Marsden
    Tue, 19 Jul 2016, 7:30 AM
    Cool tool! - David Mudrak reviewed a previous version of this plugin implemented as a local plugin - I can see the issues he raised have been resolved and it works well for me, nice work!
  • Picture of Jon Bolton
    Sun, 7 Aug 2016, 2:27 AM
    It does work well, but I narrowed this plugin down to causing a fatal error in a completely different area of Moodle. In Site Admin > Plugins > Activity Modules > Glossary, there is the option to edit and hide/show Display formats. Both the update and hide/show icons result in the message "Coding error detected, it must be fixed by a programmer: PHP catchable fatal error". I uninstalled all my additional plugins, one by one, and this one was the only one that caused this error. No idea why, I’m not a developer, but happy to test it / replicate it if needed.

    Running Moodle 3.1+ (Build: 20160616) on php5/apache2 with mysql (5.5.47-0+deb6u1)
  • Picture of Brendan Heywood
    Mon, 7 Nov 2016, 6:34 AM
    FYI the bug found by Jon above was fixed a while back
  • Picture of Max Linzmeier
    Wed, 14 Dec 2016, 3:41 PM
    There's currently a bug in file 'auth.php' on line 174. The key should be 'auth_basic', not 'auth_saml2'. Otherwise it isn't possible to save the values on settings page. smile
  • Picture of Daniel Thee Roperto
    Wed, 14 Dec 2016, 4:14 PM
    Hi Max,

    Thank you for reporting this, we updated the auth_basic plugin.

    You can find it here:
    https://moodle.org/plugins/auth_basic

    Cheers,

    Daniel
  • Picture of Max Linzmeier
    Thu, 15 Dec 2016, 4:27 PM
    Hi Daniel,

    cool, thank you! But there is another problem in function crawler->reset_for_recrawl. I created a new issue on GitHub. Thank you!

    -Max
  • Picture of Allison Soo
    Tue, 2 May 2017, 10:51 AM
    Great tool with huge potential!
    The report is displayed on screen currently. Any plan to add an export option so the users could download a copy of report in csv/excel format for offline analysis and data manipulation? This will greatly help to search and identify broken links in a particular course efficiently without scrolling through page by page. Also, any part of the report can be distributed to the respective course developer(s) whom do not have administrator access to review the broken links offline.
  • Picture of Brendan Heywood
    Wed, 3 May 2017, 8:26 AM
    thanks Allison,

    I've logged that new feature idea here:

    https://github.com/central-queensland-uni/moodle-tool_crawler/issues/23
  • Picture of Mary-Anne Camillo
    Fri, 9 Jun 2017, 7:11 AM
    Does anyone have any privacy/security
    issues using this?
  • Picture of Brendan Heywood
    Tue, 13 Jun 2017, 7:40 AM
    hi Mary-Anne,

    There should be no privacy issues, as the crawler results are only visible to admin and course managers by default. Also what the robot can see is completely configurable via moodle's capabilities so if anything sensitive is being scraped you can turn it off. Also the robot is only interested in links, it ignores all other content. So the only real privacy issue could be visibility of an external link, but either way all content that is scraped is visible to normal course admins / students anyway.

    The main security issue is making sure the robots credentials aren't leaked as then someone could gain access to whatever the robot can see. But this is exactly the same as managing credentials for any other user. The robot user should not have any write permissions to anything at all. If you need to roll the password for the robot this is trivial.

    If you, or anyone, identify any privacy or security issues I've missed please ensure they are logged here: https://github.com/central-queensland-uni/moodle-tool_crawler/issues

    thanks!

Please login to post comments