Hi Alex,
Your questions are very relevant — we've been doing systematic load testing on a Moodle 4.5 quiz deployment and our findings challenge some of the assumptions in the standard documentation. I'll share what we observed, though keep in mind our context is quiz-only (no SCORM), so some points may not transfer directly.
*RAM is not the bottleneck — CPU is*
With PHP-FPM + OPcache + a separated DB server, we found that RAM consumption was surprisingly modest. At 2000 simultaneous virtual users completing a 16-question quiz, peak used RAM on the app server (12 vCPU / 12 GB) was around 1.6 GB. The working set per PHP-FPM worker is roughly 60–70 MB private
RSS, with the OPcache shared once across all workers. Swap use was zero throughout. Keep in mind that some RAM is used by SO buffers and caches, and that is important because we use the default file cache for MUC. In our environment, 12gb was enough for all in the app server.
The actual bottleneck was CPU — specifically during the synchronized exam start spike, when all students click "Start attempt" within a short window.
That's when load average spiked to ~8 on 12 vCPUs (~65% saturation) at 2000 VUs, recovering within 2 minutes. All subsequent quiz navigation remained stable.
This matters for your sizing model: the ~1 GB per 50 users guideline (and even your more conservative 10–15 users/GB) appears to be a legacy of mod_php architectures without OPcache, where each
Apache process carries a full PHP interpreter. With PHP-FPM + OPcache, that model overestimates RAM needs significantly. (Maybe is time to update that
moodle docs)
*What actually becomes the bottleneck first*
In our architecture (PHP-FPM 8.3, OPcache,
MySQL on a separate server):
CPU on the app server was the first constraint, driven by the synchronized exam start. The DB server was not a functional bottleneck, The key for DB performance was ensuring the working dataset fits in the buffer pool — our DB server has 16 GB RAM, which was sufficient.
*When to split the DB*
We had the DB on a separate server from the start, so we can't give you a "we hit a wall at X users and then split" data point. What we can say is that with a separated DB and adequate buffer pool, the DB was a non-issue at 2000 VUs. At 3000 VUs the app server CPU was clearly stressed during the exam start spike, but the DB server still had significant headroom.
Separating the DB early seems worthwhile mainly because it gives you independent scaling levers — not because RAM on a combined server runs out.
*The exam start spike is the critical design point*
For quiz-heavy deployments, the worst-case scenario isn't sustained concurrent load — it's the moment when hundreds of students all start an exam simultaneously. This produces a very sharp, short CPU spike that is disproportionate to the steady-state load. A system that handles 2000 users comfortably during exam navigation can still struggle at the synchronized start. Worth factoring this into your capacity planning,
especially if you run large scheduled exams.
We also analyzed entry patterns from real exam data. Looking at exams with 200 -1500 participants, we found that in roughly 10% of cases more than 67% of students started within the first 30 seconds. That's the figure we used to calibrate the Gaussian Random Timer in our JMeter script — so the simulated spike reflects an actual worst-case observed in production, not just a theoretical scenario.
*On your sizing progression*
Your numbers seem very conservative for RAM if you're using PHP-FPM + OPcache + Redis. You may be significantly over-provisioning RAM while under-provisioning CPU cores. I'd suggest thinking in terms of vCPU count relative to expected concurrent exam starts, rather than GB of RAM per user. Adding Redis for object/session caching is a good call and should reduce PHP processing per request noticeably.
*Load testing*
We used Apache JMeter 5.6.3 with a custom script that covers the full quiz flow: login, course navigation, exam start (with a Synchronizing Timer to simulate the simultaneous start spike), answering all questions, and finishing the attempt. The synchronized start is the most important thing to simulate — generic load tests that don't model it will give you an unrealistically optimistic picture.
We've documented our test setup, script, and results here if it's useful:
https://www.proyectos.udelar.edu.uy/redmine/projects/moodleperf/wiki/
The results page covers infrastructure metrics (CPU, RAM, PHP-FPM pool, DB I/O) across multiple load levels, and the setup page includes the JMeter script and quiz backup for anyone who wants to replicate the tests.
One caveat: our tests cover quiz workloads only. SCORM tracking adds different patterns (more frequent small write requests for xAPI/SCORM data, different DB access patterns) that we haven't characterized. Your conservative approach there is probably warranted until you have workload-specific measurements.
The script covers the main interactions well enough for capacity planning purposes, but there is room for improvement — in particular, better coverage of
AJAX calls and a more realistic static resource model. I'm planning a revised version for Moodle 5 when time allows. If you have suggestions or feedback for that future version, I'd appreciate hearing them.
Beyond the load tests, we also have 6 years of production experience with this infrastructure. We have run exams with up to 1500 simultaneous students without issues. That real-world data point aligns well with what the load tests show: the system handles that scale comfortably, with headroom to spare.
Hope this helps — happy to answer questions about the test methodology or results.