Discussions started by Tim Hunt

Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

We have been having these really strange problems where our main Moodle site has been getting deadlocked, and we may finally have worked out why.

I am going to write about it here for a few reasons. First, it still does not really make complete sense to me, so perhaps someone will be able to explain further, and Second because other people might be affected by the same thing, so increasing awareness of it can't hurt.

What was happening was that lots of Apache processes were stuck handling requests for

POST /webservice/soap/server.php?wstoken=[TOKEN HERE]

and they were stuck because they were waiting for a database lock on the external_tokens database table (which in our database only has 11 rows).

What happens in webservice/lib.php is that every time webservice::authenticate_user or webservice_server::authenticate_by_token checks a token, it then does $DB->set_field to update the lastaccess column.

This is made worse by the implementation of webservice_soap_server. For every single request served (because it hard-codes ini_set('soap.wsdl_cache_enabled', '0'); !?), it makes an HTTP request to itself ($CFG->wwweoot/webservice/soap/server.php?token=[...]&wsdl=1) to compute the WSDL. There is no reason at all why a HTTP request should be involved there. In any case, featching the WSDL verifies the token, and so does another write to the lastaccess column.

Even so, the total number of web service calls our system is handling is quite low (only about 20 per minute, when the server is handling over 50 page-views per second) so it is not clear why this is enough to completely lock-up the server, but it is.

Does this make sense to anyone?

Then, what can we do to fix this? there seem some obvious wins (which should probably have been done as part of MDL-52208):

  • Only disable the cache_wsdl option if DEVELOPER_DEBUG is on. (In live use, the code won't be changing, so the list of avaiable methods won't change.)
  • Change it so that it does not do a HTTP request to get the WSDL. Instead, generate it once (not every single request) and save it in a temp file instead.
  • The lastaccess value is only used in one place, for one type of token (lib/classes/task/session_cleanup_task.php). However, it is always written for any type of token. Perhaps we should only set it for tokens of typet EXTERNAL_TOKEN_EMBEDDED?
  • Does anything acutally use external_create_service_token, or are embedded tokens a dead concept?
Average of ratings: Useful (2)
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

This is a rather technical request, but I am looking for someone who:

  • Has a moderately large Moodle site running on MySQL
  • Is able to take some code from git, and try it out. (So, ideally, you have a test copy of your live system where you can try things like this.)

If so, it would really helpful if you could test the proposed fix to MDL-61348. I am pretty sure that the fix means that it will now compute the right numbers (and they were only wrong in quite obscure cases - so it is useful if you can verify that the averages are the same before and after applying the fix). However, I am most worred about whether the altered DB queries will perform OK.


Average of ratings: -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

At the moment, it is quite hard to test time-related functionality in Moodle. For example, in the OU's course format, it hilights the current week, and we want to Behat test that.

One work around is to make your test set-up use relative dates/times. E.g. something like

And the "C100" course start date is "2 weeks ago"

However, a better idea (See e.g. Time section in https://martinfowler.com/articles/nonDeterminism.html) is to put a wrapper around the system clock. So that, instead of calling the PHP time() function directly, you call something like core_time::get(), which, outside test scenarios, just calls time(). Then you can have a Behat step like

And the OU study planner thinks the date is "2018-02-08"
and an equivalent API for PHP unit.

We have doen this in OU code, but we were wondering if we should put this into Moodle core.

Of course, it would be a large change, but one that could mostly be done with search-and-replace, to change time() -> core_time::get() everywhere. Then, we could add a CodeChecker rule, to help ensure that all new code used core_time::get().

Anyway, what do people think?

Average of ratings: Useful (1)