Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Number of replies: 20
Picture of Plugin developers

We have deployed Moodle in a Server Cluster configuration using the following structure:

  • Load Balancer
  • Number of App Front Ends (Moodle 2.8 latest)
  • MySQL shared database
  • NFS to host "moodledata"

When we test this setup with multiple users, we consistently get a number of HTTP requests to Moodle not getting a response from the application but from the Load Balancer (LB). The LB generates 504 GATEWAY_TIMEOUT error for those requests. After further investigating the issue we have found the following:

  • The requests generating the errors get properly forwarded to Moodle Server by the LB
  • The requests generating the error are causing an Apache crash due to a fatal error at PHP level.
  • In our view, this error is related to concurrent access to "moodledata" from Moodle instances running in different FrontEnd Servers. We believe this type of concurrency is not properly handled at Moodle code level.

***************

Error found on Apache Logs:

[core:notice] [pid 32656] AH00051: child pid 308 exit signal Bus error (7), possible coredump in /etc/apache2

***************

Coredump Info

ubuntu@ip-172-31-15-136:~$ sudo gdb apache2 -core core

GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1

Copyright (C) 2014 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from apache2...Reading symbols from /usr/lib/debug//usr/sbin/apache2...done.

done.

[New LWP 5173]

warning: .dynamic section for "/usr/lib/php5/20121212/memcache.so" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/usr/lib/php5/20121212/memcached.so" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/usr/lib/php5/20121212/mysqli.so" is not at the expected address (wrong library or version mismatch?)

warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/php5/20121212/sasl.so.

Use the "info sharedlibrary" command to see the complete listing.

Do you need "set solib-search-path" or "set sysroot"?

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Core was generated by `/usr/sbin/apache2 -k start'.

*Program terminated with signal SIGBUS, Bus error.

#0 lex_scan (zendlval=<error reading variable: can't compute CFA for this frame>) at Zend/zend_language_scanner.c:1091

1091 Zend/zend_language_scanner.c: No such file or directory.

(gdb) *

***************

As we already said, we believe the issue is related to concurrent access to NFS. having a single server in the cluster using the exact same configuration does not generate any type of errors.

Anyone has experienced the same issue? Any hints?


Average of ratings: -
In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Moving to Hardware & Performance...

I can only note (at this stage) that there are plenty of people out there (me included) running similar setups with no issues. Is this all Ubuntu server software via apt-get?

Also, can you explain your reasoning behind "we believe the issue is related to concurrent access to NFS"?

In reply to Howard Miller

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

Our setup works fine when there is a single server accessing the NFS. We experience no issues at all. 

The moment we have two or more servers using this configuration, we start experiencing the issues described.  We believe:

1. Apache is crashing due to a PHP error. (this is for us a fact as it has been clearly seen in the logs).

2. The "core" file shows an issue on the "zend_language_scanner.c" This error, combined with the fact that only happens when there is concurrent access to NFS makes us think the issue could be related to a concurrent access to NFS resources that somehow are blocked or not accessible when there is an existing Moodle instance already accessing them.

When we have, for example, 2 servers running Moodle in a cluster configuration against the same NFS instance, we are getting around 25-30% generating the error while the remaining ones are working fine.

We are using Moodle on top of Ubuntu. We are using apt-get.

In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Is this you? http://stackoverflow.com/questions/30753656/elb-generating-504-gateway-timeouts-w-2-ec2-instances-packets-not-reaching-se

Which version of Ubuntu is this? Which version of PHP is this? Do you have OpCache enabled (or some other accelerator)? Is there anything out of the ordinary that you are using or have configured? However unrelated it may first seem?


In reply to Howard Miller

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

Correct. This is me.

- Nothing special or unusual as far as I understand.

- Ubuntu:  14.04.2 LTS

- PHP: 5.5.9

- Opcache config: 

Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies

Opcode Caching Up and Running 
Optimization Enabled 
Startup OK 
Shared memory model mmap 
Cache hits 1088 
Cache misses 830 
Used memory 39663144 
Free memory 27184304 
Wasted memory 261416 
Cached scripts 825 
Cached keys 875 
Max keys 3907 
OOM restarts 
Hash keys restarts 
Manual restarts 0

Initially we also thought that cold be related to NMAP, but the investigation didn't go this way any longer.

In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

I would (try) disabling opcache if you haven't already. 

In reply to Howard Miller

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

I thought about it, but i see this statement in "Moodle documentation" that made me thought it would not be a good idea in terms of performance.

"The standard OPcache extension is strongly recommended; since Moodle 2.6, it is the only solution officially supported by PHP developers. The benefits are increased performance and significantly lower memory usage"

I will try and see.


In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Well... it's just a hunch but you need to do something to attempt to isolate the problem. 

If it *is* the problem then we'll go from there...

In reply to Howard Miller

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

We have disabled OPCACHE. We have done it at Moodle config level by adding the following sentence into the config.php file:

ini_set('opcache.enable', 0);

After disabling OPCACHE the error is gone.

Something is not properly handled when using OPCACHE in a configuration where moodledata is shared among different Moodle Servers.

OPCACHE caches php files for faster execution. This is the list of php files existing on moodledata:

./lang/es/....php  (big list of files)

./cache/core_component.php

./muc/config.php

There is maybe something wrong when accessing those files on a shared disk.

Not able to evaluate yet the impact on performance as this would require further testing on all the different scenarios we have been using for benchmarking.



In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

It's progress smile

In reply to Howard Miller

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

Correct. It is progress. But my question is:


- Is this a bug or a feature? smile In case is a bug how to move forward ...

- What is the impact on performance for my architecture? I need to run again the testing from A to Z and compare data.

- Is there anyway to move forward with OPCACHE active while sharing files on an NFS system given the way Moodle is architected?


Not sure I will be able to answer all this questions ...

In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Albert Ramsbottom -

I  have configured multiple webs x 4, using apache balance manager and an apache reverse proxy to forward requests from the outside through to the inside and are using NFS shared Moodle data and Opcache

And it works fine

So.....I am not sure at the moment but you might want to look at your opcache configuration in your php.ini


Albert


In reply to Albert Ramsbottom

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

Trying to join some pieces together:


********** PIECE ONE **********

Server Clustering Improvement Proposal: https://docs.moodle.org/dev/Server_clustering_improvements_proposal linked to MDL-40979

"The goal is to use opcaching without checking of file modification times - problem here are dynamically generated PHP files that are stored in dataroot (component cache, lang packs, local lang modifications, muc config, etc.), we need to invalidate the op code cache explicitly."

According to my understanding, this sentence implies:

1. There are currently problems with PHP dynamically generated files stored in dataroot and OPCACHE.

2. These problems require invalidating OPCACHE (disabling it, If we want to avoid the issues, I guess).

The Moodle Case is still Open.


********** PIECE TWO **********

We have performed two types of additional tests:

1. Disabling OPCACHE globally at config.php. The issue is gone. Pending performance impact evaluation.

2. Having only "filedir" at NFS while keeping all the other directories local to each of the servers forming the cluster. The issue is gone.


********** PIECE THREE **********

Looking at the error we are getting (apache dying) and the core dump analysis, we can see that "zend_language_scanner.c" is failing which is the piece of the software that needs to be involved in verifying the PHP files (lexing) once OPCACHE believes the in memory cached version is no longer valid.

Points, once again, to a problem with OPCACHE.


********** PIECE FOUR **********

The issue is not easily seen. It only affects 15-20% of the requests served by the Moodle Cluster. We have done extensive profiling of our architecture with JMETER and took us a while to follow the symptoms until we realized that Apache was dying. We could see the Load Balancer replying back to the client but we had to follow all the way through from the client to the Server.


My next two steps:

1. Analyze performance impact of disabling OPCACHE

2. Understand whether OPCACHE could be configured in a different way where this impact is minimized. (Albert, if you do not see the issue, could you share with us your active OPCACHE config?


In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Where are your Moodle program files? They are not on the NFS are they?

Do you have the MUC properly set up using memcache(d) or whatever?

In reply to Howard Miller

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

My Moodle program files are on local disk of each of the servers running the cluster.

The issue shows up either memcached is configured or not. I have tried both options.

In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Matteo Scaramuccia -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Hi Josè,
filedir is where Moodle hosts its pool of dedup files but that is not the only folder you should share among the nodes. More details in https://docs.moodle.org/29/en/Server_cluster#Related_config.php_settings.

BTW, you need an opcode cacher otherwise scalability will be affected: you could share your strace output (search with google for some example about strace and php).

Please post also your opcache configuration to compare it with the recommended settings: https://docs.moodle.org/29/en/OPcache.

My guess is that you're hitting a bug in your environment and not in Moodle.

HTH,
Matteo

In reply to Matteo Scaramuccia

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

I agree with you Matteo. "filedir" is not the only directory that should be shared, but as soon as I share any "moodledata" directory containing PHP files, the issue shows up. Just wanted to test what happens if I do not share any PHP file.

I am evaluating the performance impact of not using OPCACHE on our configuration. I agree with you. Opcode cacher should be enabled. I will try to get an strace.

Opcache configuration is shown in a previous message. I attach it again for your reference:


Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies

Opcode Caching Up and Running 
Optimization Enabled 
Startup OK 
Shared memory model mmap 
Cache hits 1088 
Cache misses 830 
Used memory 39663144 
Free memory 27184304 
Wasted memory 261416 
Cached scripts 825 
Cached keys 875 
Max keys 3907 
OOM restarts 
Hash keys restarts 
Manual restarts 0


More than welcome if you can help me find the "bug in my environment", but for me is somehow revealing the fact that there is an open case in Moodle stating that there could be issues with OPCACHE in a Server Cluster configuration.

In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Interestingly, I have a very similar configuration to you - except it's running on CentOS with PHP from Webtatic - and we've had no bother at all. 

My test box, is actually Ubuntu - again load-balanced but only "for fun" as it's quite lightly loaded. Still, no issues like this at all. 

I'm trying to think what might be significantly different.

My feeling is that you may have a broken library somewhere. I'm not being defensive about Moodle particularly but this isn't a PHP *source code* issue per-se. You shouldn't be able to get a failure like this in PHP even if you wanted to. Not that it helps much.

In reply to Howard Miller

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by José Antonio Omedes Capdevila -
Picture of Plugin developers

Many thanks Howard. If you can think of something else more than welcome. In the meantime, I am going to try contacting Peter, the Moodle developer that created the "Cluster Server improvements" document to see what was he referring to when he was concerned about OPCACHE in such environment.

In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Matteo Scaramuccia -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Hi Josè,
are you talking about Petr and https://docs.moodle.org/dev/Server_clustering_improvements_proposal ?
That was the original idea proposed to the Community audience and actually almost implemented, including the opcache invalidation for those PHP files stored in the dataroot as part of some caching.

 

In reply to José Antonio Omedes Capdevila

Re: Moodle 2.8.1. Server Cluster configuration. Apache Crashing with "exit signal Bus error (7)"

by Matteo Scaramuccia -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Hi Josè,
that is not the configuration I'm talking about: I'd like to read your plain php.ini settings about opcache i.e what described in https://docs.moodle.org/29/en/OPcache#Configuration.

strace will be the key to understand what is broken in your environment: there are many clusters out of there running MUC on a file system shared between the front-end nodes and happy with opcache.

As Howard told, it could be a library but to find the culprit you need to go into the internals of the broken requests: it's time consuming and stressing but when you'll find the root reason your patience will be rewarded.

You could also post the configuration of the NFS mount point (assuming that it is equal for each front-end node): see here an example, https://moodle.org/mod/forum/discuss.php?d=310501.

HTH,
Matteo