I'm new here, and I would like to check if you can help me solve a problem that has been happening in the Moodle application here from the company where I work.
In March 2020 we migrated our application from a VM to Kubernetes. Since then, there had been no problem. But in the past two weeks it has happened twice.
The problem is that many users start to receive errors of 504 (Timeout) and the application simply does not access them anymore. It is charging and does not access while for others it seems to be possible to access.
We analyzed Kubernetes database and resources (Memory / CPU / etc) and everything was normal during this problem.
However, when we removed some folders that are inside the "moodledata" (localcache, cache and sessions), immediately after that, everyone started to access the application normally.
Below some information about the application configuration:
- For application server we are using a PHP 7.2 Docker image with Apache.
- For application and session caching we are using Redis.
- The database server is PostgreSQL.
- The moodledata folder is on an NFS volume on the server, which is binded into the application folder, staying in the same folder as the moodle files.
- We are using version 3.8.1+ (Build: 20200117) of Moodle.
- At Kubernetes we have 3 containers with 1.5GB of RAM and 1.5 vCPU for each container.
The interesting thing is that when this happens, right after removing the localcache, cache and sessions folders inside mooodledata, the application will work again, and we will configure it to use Redis as the application and session cache server.
Do you have any idea what may be happening?