We have deployed Moodle in a Server Cluster configuration using the following structure:
- Load Balancer
- Number of App Front Ends (Moodle 2.8 latest)
- MySQL shared database
- NFS to host "moodledata"
When we test this setup with multiple users, we consistently get a number of HTTP requests to Moodle not getting a response from the application but from the Load Balancer (LB). The LB generates 504 GATEWAY_TIMEOUT error for those requests. After further investigating the issue we have found the following:
- The requests generating the errors get properly forwarded to Moodle Server by the LB
- The requests generating the error are causing an Apache crash due to a fatal error at PHP level.
- In our view, this error is related to concurrent access to "moodledata" from Moodle instances running in different FrontEnd Servers. We believe this type of concurrency is not properly handled at Moodle code level.
***************
Error found on Apache Logs:
[core:notice] [pid 32656] AH00051: child pid 308 exit signal Bus error (7), possible coredump in /etc/apache2
***************
Coredump Info
ubuntu@ip-172-31-15-136:~$ sudo gdb apache2 -core core
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from apache2...Reading symbols from /usr/lib/debug//usr/sbin/apache2...done.
done.
[New LWP 5173]
warning: .dynamic section for "/usr/lib/php5/20121212/memcache.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/usr/lib/php5/20121212/memcached.so" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/usr/lib/php5/20121212/mysqli.so" is not at the expected address (wrong library or version mismatch?)
warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/php5/20121212/sasl.so.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/apache2 -k start'.
*Program terminated with signal SIGBUS, Bus error.
#0 lex_scan (zendlval=<error reading variable: can't compute CFA for this frame>) at Zend/zend_language_scanner.c:1091
1091 Zend/zend_language_scanner.c: No such file or directory.
(gdb) *
***************
As we already said, we believe the issue is related to concurrent access to NFS. having a single server in the cluster using the exact same configuration does not generate any type of errors.
Anyone has experienced the same issue? Any hints?