Hello, I am tunning my moodle site. It's hosted on AWS with the following configuration:
- Moodle v3.7.3
- Frontend web servers fleet with nginx-1.14 (with proxy_cache to ramdisk) + php-fpm-7.2.24 (with Zend Opcache engine)
- Database: Aurora Mysql 5.6.10 with read replica failover
- Redis: for MUC caching with igbinary serializer enabled
- Memcache: for session data and lock factory
- EFS: for moodledata (EFS is NFS-based)
My specific problem is due to the shared temp folders between the web servers in the fleet: $CFG->tempdir, $CFG->cachedir and $CFG->backuptempdir. These folders are stored in the moodledata folder, slowing down certain operations, and obtaining disastrous results in the benchmark test: Reading file performance=6,57 seconds and Writing fileperformance = 54,69 seconds.
I'm testing other configurations that include a dedicated node to store these folders with:
- Option 1: NFS server(this solution avoid the EFS metadata overhead for the small files).
- Option 2: GlusterFS volume with gluster native mount on web frontends
- Any other suggestion?
In addition, this dedicated node could be used for cron and backup tasks.
With option 1 I have managed to write file performance of 9.8 secs (compared to 58.67 secs) and read file performance of 0.048 secs (compared to 6.57 secs). I have verified it with a course backup that previosly took ~94 secs, now I get ~45 secs.
On the other hand, with option 2 I have managed to write file performance of 3.262 secs (compared to 58.67 secs) and read file performance of 0.673 secs (compared to 6.57 secs). In the same way, the course backup that previously took ~94 secs, now I get ~37 secs.
Are these three shared temp folders a critical point of failure? I think they store very volatile files that, in case of failure of this dedicated node, could be replaced by another fresh node without any other problem than the web servers would have to regenerate the shared temp content. I'm right?
Initially I thought to create a GlusterFS replicated volume between several dedicated nodes but it seems to me a waste of resources if the data is not critical
I had in mind a third option which was to create a replicated Glusterfs volume between the different frontend web servers to achieve this, but the results are obviously worse due to the overhead of having to replicate between all the nodes.
What is your opinion? Any other suggestions?
Thanks