This one is proving to be a real headscratcher!
Our current Moodle setup consists of two servers on our DMZ (VMs on VMWare vSphere Infrastructure) for Moodle and Moodle Archive (last year's Moodle) respectively. They both use databases stored on another VM on our internal server range (with a translation through from the DMZ). The Moodle servers are running Moodle 3.7.4+ with the latest Adaptable theme, Debian 10, Nginx, and php7.3-fpm. The Database server is running Mariadb 10.3. They are talking to the database server through a SSH tunnel. The setup has been working fine for around 2 years.
However, sometime early last week I noticed a slowdown initially when viewing user profiles. I checked and I was getting TTFB of around 15 seconds, although most other pages were somewhere around 1.5 seconds. In the last week I have determined that this also occurs every time that any editing is enabled (edit user/resource etc). This is just opening the edit window - not saving edits (althogh that has the same issue).
Some pages are taking even longer, opening marketing blocks on adaptable theme for example has been taking 90 seconds TTFB today!
The load on both the Moodle servers and the database server is around the 0.32 range, plenty of RAM (both DB and main Moodle server have around 32GB RAM and less than half of that is being used - more on the database server understandably) PHP-FPM and Nginx have plenty of servers/workers/children etc.
I have opcache working and the cache management shows a very high hit rate - between 50% and 99% depending on what I am looking at. I am not getting any notable errors in either nginx logs or fpm logs.
I have updated both Moodle servers with git-pull for the latest 7.3.x and updated any plugins that needed it, updated the OS on all three servers, rebooted them etc.
As SohposAV is running on the Moodle servers and we HAVE had some issue with that - I have tested temporarily disabling this with systemctl stop sav-protect; systemctl stop sav-rms to no effect.
The really puzzling thing is this was all working with no problem up until a week ago. I have check Moodle performance overview and can confirm that theme designer is off and statistics are disabled (I have been caught by statistics before!) Everything is in the green in performance overview.
I am only using file caching, but this is coming back with times like 0.12 at the most - so I do not think this is an issue (sure something like Memcache would likely be better - but this is not causing the issue and has been like this for years).
Over the last two days I have spun up two VMs on Virtual Box on my work PC - one as a database server (using the same confguration (my.cnf mariadb.conf.d/50-server.cnf etc) and restored the previous days database backup. The other server has been a debian 10 box with the same Nginx/PHP configuration. I have set Moodle on this VM to connect to the database server via a similar ssh tunnel and have tested the performance of this. Although it has been slower than ideal - these are both running on my PC which had little memory left after running these two VMs! The performance however was significantly better. TTFB on the same operations around 1-3seconds.
Today I have changed the local Moodle VM to talk to the LIVE database by changing the authentication details and altering the SSH tunnel to connect to the Live Database. Again the performance was totally acceptable considing the webserver was running on a VM on my PC. Importantly even though it was talking to the SAME database server and database as our live Moodle server - it was not suffering from the same performance degredation.
As this degredation of live services was occuring on both the Moodle server and the Moodle archive server, I was previously leaning toward the issue being with the database server but as it performs fine with another webserver pointing at it - I think this has been eliminated.
I am left then with a VM that works fine - and two live servers that do not. As they are as close to the same configuration as the live server (and the fact they were working fine till last week) I am at a loss.
This degredation DOES coincide with an update of Moodle (git-pull) and plugins/theme, and an update to the OS so I would be willing to put this down to an incompatable "something" however the VM has exactly the same Moodle/Plugins/OS so I do not think that stands up.
My current thoughts are that SOMEWHERE there is a configuration change in something that has been updated that has reset something, but as I have tried to keep the same configuration (copying the config files to the VM) I am not sure what this can be.
Importantly this is not for ALL of Moodle. Many areas still have adequate performance, until I go to edit anything (or students upload anything) or I view or edit a profile.
I realise there is not much to go on here but I have tried to enclose as much as I can (hence the wall of text). If anyone has ANY suggestions - they would be gratefully received! My only thought going forward is to rebuild the OS on another Infrastructure VM and reattach the datadisk (the moodledata is on a separate disk) and see if this fixes it. However, as this was working till last week, and this option would still not identify the issue - it could come back. It would also require a couple hours downtime while the data disk was copied over.