Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Petr Ovsenák -
Number of replies: 10

Hi all, 
we are running a Moodle instance with approximately 15,000 users, and we are currently facing a technical issue that we have been unable to resolve on our own (unexpected 500 internal server errors).

Main issue:

Users get 500 internal server errors (up to 2% of all requests in IIS logs).
If there is about 130 concurrent users logged-in, almost each Moodle page sometimes ends with 500 internal server error.
But error appears if there is only a few concurrent users (sometimes).
Many of 500 errors are related to opening SCORM, or some SCORM content. But not 100%.
Problem this there is no regularity or rule in 500 error appearance.

Environment:

  • Windows Server + IIS + SQL SERVER
  • PHP 8.3 NTS
  • Moodle 4.4.X (resp. 4.5.X for another, one node solution with same error)
  • 2 nodes architecture:
    • 2 application servers (RAM 24GB, CPU 8 cores)
    • shared file cluster for Moodledata storage (changed from network file cluster to shared local disc, but it didn't help)
    • one technical windows domain account for connection to DB (windows authentication) and to access to /moodle and /moodledata
  • main settings of FastCGI, PHP, Opcache, AppPool see in attached file
  • config.php see in attached file
  • database is used for session storage
  • DB locks are used

What is different from other implementations:

  • 2 nodes (but I dont believe its a problem, because we get the same errors on different, one node Moodle LMS
  • caches folders (/cache, /localcache, /temp) are placed on application server, instead of /moodledata folder. If we place them to /moodledata, Moodle is very very slow, see benchmark results in attached file.
  • high score in report Benchmark tests, see benchmark results in attached file
  • specific environment - maybe something in infrastructure, TCP/IP, HTTP, firewall.. But we are not strong in these areas.

Logs:

  • PHP error is clean (no errors)
  • IIS logs shows errors with codes:
    • 500 0 2147500037 (if cache folders are placed on application server)
    • 500 0 64 (if cache  folders are inside /moodledata)
    • request tracing failure is enabled (each day hundreds of XML files generated)

I can provide all logs needed.

We appreciate any help!!

Thanks.

image.png

Average of ratings: Useful (1)
In reply to Petr Ovsenák

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Sergio Rabellino -
Picture of Particularly helpful Moodlers Picture of Plugin developers

Probably it's not the main cause of 500 but, if I understood right, your positioning of the cache dir it's wrong (cfr. https://docs.moodle.org/500/en/Server_cluster#%24CFG-%3Ecachedir) as the documentation says "This directory MUST be shared by all cluster nodes. Locking is required." and not saved onto each application server node .

500 in turn tells me that your PHP execution goes wrong: say why it's usually hard to find, but the first location to take a look is at the PHP log. 

HTH

In reply to Sergio Rabellino

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Petr Ovsenák -
Thanks for response.

PHP log is empty - no errors.

I know about this requirement: "This directory MUST be shared by all cluster nodes. "

But we struggle with the same error (500 0 2147500037) on ONE NODE solution:
/moodledata on file cluster
/temp, /cache, /localeache folders on application server
In reply to Petr Ovsenák

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Sergio Rabellino -
Picture of Particularly helpful Moodlers Picture of Plugin developers

PHP timeouts ? e.g. max_execution_time ? Or IIS timeouts (but It's a long time ago when I played with IIS... so no specific hints)

In reply to Petr Ovsenák

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
If you can't find the 500 error in the logs then you are looking in the wrong log. Additional information (that we hope) is logged along wth the 500 will give you a vital clue. 500, on its own, tells you little or nothing.
In reply to Howard Miller

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by ovsiknela . -
Thank for your responses.

We get plenty of 500 errors in IIS log. Up to 2% of all requests.

E.g.
2025-05-26 07:22:33 10.197.124.11 GET /pluginfile.php/87399/mod_scorm/content/5/res/data/sound1.mp3 - 443 - 10.192.8.6 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/136.0.0.0+Safari/537.36+Edg/136.0.0.0 https://....../pluginfile.php/87399/mod_scorm/content/5/res/index.html 500 0 2147500037 1 XX.XXX.XX.XXX

Some with timeTaken = 0 (crashed immediately), some with long timeTaken, e.g. 25 seconds.

As I wrote before, THERE IS NO RULE in 500 errors, NO REGULARITY.
Yes, with more concurrent users we get more errors, but no direct dependence appears.

We get also plenty of XML files from IIS Request Tracing Logs.
Tried analyse various of them with AI, no success.

PHP error log is empty.
Thus, FastCGI crashes before it is processed by PHP.

We run more than 100 Moodles and I have never seen anything similar.

So my suspicion is that error is caused by some element / limit on server (or net) side, that is ovelooked by us. But which one?

Something that can kill FAastCGI processes with no warning / waiting.
In reply to ovsiknela .

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Unless I'm missing something... there's no Error 500 in the log line you have shown. It's a successful HTTP GET request...
In reply to Howard Miller

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by ovsiknela . -
No, its ending with code "500 0 2147500037" = error. Unspecified error.
 
Now I get this:
 2025-06-30 14:48:05 10.197.85.25 GET /admin/purgecaches.php - 443 - 10.192.8.6 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/137.0.0.0+Safari/537.36 - 500 0 2147500037 2
 
image.png
In reply to ovsiknela .

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Ok - that's just the access log. What we *really* need is the log of whatever caused that 500. It should be in the error log. Unfortunately, I know nothing about Windows/IIS so can't help with the specifics. I have a very vague recollection that IIS doesn't log errors by default - but I could easily be talking complete nonsense. Without knowing the actual error message, we're really struggling here.
In reply to Howard Miller

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Ken Task -
Picture of Particularly helpful Moodlers

A clue? ...

@ovsiknela
First, I don't run Windows but have had sites with SCORM issues:

Petr posted ... clip
"Many of 500 errors are related to opening SCORM, or some SCORM content."

In one of your postings:
".... /mod_scorm/content/5/res/data/sound1.mp3 .."

SCORM's are mini-apps shipped to the client requesting it.
They 'play' on the client machine with no interaction with server until client clicks a finished/done button.   They can, however, close the SCORM window and then Moodle will record they did not finish.   Next time same client request that SCORM, user is prompted if they would like to continue where they left off last time.

In the meantime, the SCORM being played might send a 'checknet' request back to your server to let your server know they are still out there and working on the SCORM mini-app.

See if your logs contain a 'checknet' request and IF those were made right before  a clients 500 issue.

Total guess!   But based on prior experience with a site the used SCORM's exclusively.   No resolution to the issue, other than the suggestion to make the SCORM interaction shorter - which was made by the project lead on SCORM's.   Sad part was ... entity did not/refused to do that and eventually think they moved to Canvas LMS or started using some other LMS that specialized in SCORM's.

'SoS', Ken

In reply to ovsiknela .

Moodle LMS for 15,000 users (Win+IIS) - 500 internal server error

by Brett Dalton -
Picture of Moodle HQ Picture of Particularly helpful Moodlers
Given the early comments about the cache file and this I would suggest 2 things. Investigate any load issues on your NFS volumes. It feels to me that files are intermittently not loading. Its not likely to be a DB issue as you would see lines in the PHP log from incomplete SQL queries or unable to get a DB connection, so the issue is before PHP is being run.
Average of ratings: Useful (1)