In case anyone is curious, I finally found the problem! I've also included a script I wrote that helped me find it.
I did some extra research (largely thanks to this thread!) and found that basically I'd been reading iostat all wrong. I'd just been looking at r/s, w/s, and await and trying to figure out what values seemed reasonable. The trick, and most of you probably know, is to watch for differences between await and svctm (as explained here). Once I started looking for it, I started noticing differences of literally 100x!
As an experiment, I wrote 500M to the disk on another server I control. It took about 8 seconds without the CPU breaking a sweat, as expected. On the moodle server it took 8 MINUTES, with the CPU pegged the entire time. My IT folks didn't say exactly what the issue was, but my guess is the VM was attached to some kind of storage array that was optimized for read-heavy applications and pretty crap at anything else.
Anyway, I sent them some iostat logs taken during the test and they immediately moved me to a new storage back-end, where the same test now completes in < 1 second, and I haven't seen the slightest slowdown from Moodle in days. Success!
Thanks for all the ideas, folks. Even when they didn't turn out to be the problem, they at least got me thinking about it.
In case it helps anyone out, here's a script I wrote to run as a regular cron job. It checks for a load average > .95, and if it sees one generates a report on the state of the system. If I'd known what to look for, the data this was sending me would have solved my mystery long ago!