Hello everyone,
I'm new to Moodle, so this is my first post. Please be patient if I get something wrong or if my question is confusing.
I am experiencing extremely high latency when writing to an NFS share in my Moodle installation, which is severely affecting performance. I'd like to summarize the tests and benchmarks we've conducted and hope for some advice on how to resolve these issues.
Here is my problem:
Setup:
Web Servers: 3x (web01, web02, web03)
Database: Postgres Patroni Cluster
Redis Server: redis01 used for sessoin and application cache I'm not aware what the application cache stores and if it's a good idea to have the request cache left on the filesystem.
File Server: file01
Hardware: 8x 2.5 GHz CPU, 16 GB RAM per server
NFS Mounts:
moodledata is mounted over NFS to all web servers.
tempdir and cachedir are also mounted over NFS.
localcachedir and localrequestdir are stored locally on the web servers.
The NFS server is a cloud system hosted by our provider. The disk on the file server is mounted with the following options:
/dev/disk/by-id/scsi-0HC_Volume_XXX /data ext4 discard,nofail,defaults 0 0
On the web servers, the NFS mounts are configured with the following options:
file01:/moodledata /moodledata nfs4 _netdev,auto,noacl,nocto,rsize=32768,wsize=32768,noatime,nodiratime,ac 0 0
We also tested using async, but this didn’t improve performance.
Tests we did:
File writing:
Extremely high latency of 49.773 seconds (acceptable: < 1 second, critical: > 1.25 seconds).
Other tests:
Database operations, PHP functions, and file reading are within an acceptable range.
Moodle directory configuration:
$CFG->tempdir = '/moodledata/temp'; → NFS
$CFG->cachedir = '/moodledata/cache'; → NFS
$CFG->localcachedir = '/moodledata-local/localcache'; → local on web servers
$CFG->localrequestdir = '/moodledata-local/temp'; → local on web servers
Server-side tests:
Writing with dd on the NFS mount (from web01):
5.4 GB in 23.66 seconds (~227 MB/s)
Fio test on the NFS mount (from web01):
IOPS: 6176, Bandwidth: 24.1 MiB/s
Average latency: 662 ms, Maximum: 3488 ms.
Fio test directly on the NFS server:
IOPS: 4196, Bandwidth: 16.4 MiB/s
Average latency: 975 ms, Maximum: 13 seconds.
Fio test on the local moodledata directory (web01):
IOPS: 273k, Bandwidth: 1066 MiB/s
Very low latencies (Average 15 µs).
Network tests:
iperf (web01 → file01):
Stable bandwidth of 7.33 Gbit/s, no bottlenecks detected.
Did a nfsstat on the fileserver:
Server RPC statistics:
calls badcalls badfmt badauth badclnt
2750390491 957 231 726 0
nfsstat shows a high usage of getattr (18%), putfh (27%), and sequence (26%) operations. even while device in fstab is mounted with ac option.
Question:
Where should I start to troubleshoot the write speed issues? The high write latency to the NFS share is slowing down the entire system.
Has anyone suggestions how to fix / optimize this?
Thank you in advance for your help!