How far along is the implementation, compared with the planned architecture?
Does it change the dbqueries and page creation times on large & busy sites for the better? Does it scale well when the database gets large? Are the invalidation strategies working well?
I am starting to instrument a large Moodle hosting setup using MySQL at Remote-Learner to get good stats. So I guess I will have some hard numbers relatively soon (though AIUI there's a limited number of 2.4 installs, and I'm not sure about their size & traffic).
So I am interested in other people's notes -- to inform my analysis and perhaps add instrumentation to previously unexplored corners
Of course I'll post my findings, analysis... and patches.
Looking at the cache setting available in 2.5dev, there are much more caching implemented than in 2.4 so for my understanding caching in 2.4 is still partial, however some MUC developer may be much more helpful about current and future status.
About performance we are currently working with Moodle 2.4 LAMP setups using memcached on some test environments, and up to now results appears to be encouraging, however we still need to get figures from stress tests and actual production environments, so it may be premature to say anything about.
1. The OU has not yet done load-testing of 2.4 vs 2.3, but we are going to soon. Hopefully we can share what we learn.
2. As you do Moodle development, I assume you have all the dubugging opitons turned on on your development machine. If you do that, you will see information in the page footer about what caches were used by that page, and how many reads/write there were to each.
3. By default, most caches go to files on the file system (shared FS, moodledata). So, we are trading DB reads for FS reads. You probably know more than me whether that is a good trade-off.
4. If you look in the admin settings, you change change most caches to use a different store (e.g. memcache(d)). Good measurements of the trade-offs there would be very valuable.
5. Roughtly speaking, 2.4 but the cache infrastructure in place, but only converted a few places to use it. 2.5 converts many more old ad-hoc caching solutions, and new never-before-cached, things to use MUC.
6. One place that should be a big win in 2.4 is attempting quizzes. I used MUC (which turned out to be a very easy API to use) to relace the many DB queries required to load question definitions with fewer loads from a cache. I would love to see confirmation that what I did was acutally a big win. I know I should have measured myself, but there has not been time.
7. The person who has done most of the work on this is Sam Hemmelryk, who lives in Nelson, NZ. You should probalby talk to him if you have not already.
to complete the picture given by Andrea and Tim, I'd like to add some notes from the Tracker:
- MUC Stage 2: here you can read about what are the areas/components that will be covered by the MUC support and track the progress status of the work
- There are already some contributions about other cache stores: see here for details, shortly I've created XCache support and Sam already created the support for both XCache (my fault not to ask to Sam before starting my work... BTW, now we've two implementations and one could be used to review the other i.e. using Sam one at the very end) and APC
- There's no real life documentation around MUC configuration but some posts in the Community. A good starting point is: https://moodle.org/mod/forum/discuss.php?d=225617#p980157
I think the executive summary is that the infrastructure is there, but actual use of caches was limited in 2.4. More caches are continually are added and there are more in 2.5, but as yet I haven't seen much in the way of real world analysis of the impact. Now we have the infrastructure to add caching in a way we didn't before. I hope to see more of this when real world data comes to us.
Nobody really mentioned the readme file in /cache/ which is helpful from a developer POV. The caching component in the tracker is probably a good place to look to get an idea of the current state of play. Matteo and Sam have really been the major drivers in this area
Yesterday we had brief tour of v2.5 with Justin Filip. Notes and observations from a couple runs on his nice idle MBP...
* Logged in as admin an _empty_ moodle site took ~2s to load homepage. dbqueries were around 30, which is low, but 2s is an eternity, which makes me suspicious that the caches are expensive.
* Disabled caches completely -- whoopeee! >600 dbqueries! -- we used xhprof and got an excellent callgraph. There are clearly things to work on there -- >300 calls to get_module_list(). I know I've been out of the moodle world for a while but... how about disabling this caching facility during development (or on master), so the dbquery inflation is kept in check?
* Got a callgraph with caches on, which quickly pointed at a problem in the files cache, which is pointlessly MD5()ing the same stuff twice for no gain. Look for md5 in cache/stores/file/lib.php -- there is no way that'll catch something. If the kernel/FS code is so broken as to fail that test, Moodle will never execute this far. Except perhaps ENOSPC but we can check for errors in the write() and fflush(). As it stands, md5_file() was ~ 17% of exec time.
* The callgraph() also shows that we were rewriting most/all the caches, even if nothing had changed.
* Caching mdl_config has a silly chicken-and-egg problem because the caching configuration is stored in... oh, nevermind. (Hint: simplify/automate the caching configuration, and move the tunables to defines in config.php).
* Looking at the "files" caching scheme... it tries to do hashing for the directories... but it is defeated by having a prefix on the filename!
* It seems, at very first blush, to be caching global things on a per-session basis. This is probably a mistake in my observations - why would we cache mdl_config keyed on user session?
* Relatedly, using sessionkey to derive a filename does not seem like a good idea at first blush.
These are early notes I wanted to share. I do intend to dig more deeply into this, but it may take me some time.
At this point the MUC code flow & call graph leaves me scratching my head plenty; it will take some more time to wrap my head around it.
Tim Hunt has been doing some research on this same track, and I like his listing of bugs (yesterday I had the right tracking bug, today I lost it -- /me shakes fist at Jira).
Thanks for this comment, interesting.
By the way, I am not an expert on this but I wanted to share a few points.
First, just to correct one thing - the configuration for MUC is not actually stored in mdl_config. It's stored in a separate, rather confusing config file in dataroot. So caching mdl_config (and config_plugins please) does make sense and should be one of the quick-win things it achieves.
My main concern about caching is that there is currently no good way to support local caches in a system with multiple webservers. (Large systems where scalability is most critical always have multiple webservers.)
When caching something derived from a database query, if it is a simple database query that executes quickly and entirely from memory (most are), then a significant part of the time cost is probably the network transaction. (Webserver -> database server -> webserver.)
If the cached data is on shared storage (e.g. NFS server, or memcache), then this instead translates into... er... the same amount of network traffic. (Webserver -> shared storage server -> webserver.) It removes load from the database, which is good, but adds it to another single point of failure bottleneck (the NFS/memcache server), which is less good. These other storage servers may be faster and easier to cluster than the database but it's still not a fundamental improvement.
Using local storage (on the same hardware as the webserver) would potentially (and in our testing, does actually) achieve large performance and capacity improvements. Local disk is very fast because disk caching means that frequently-read data comes directly from RAM so there is no I/O - no network traffic, no disk - at all. (Local memcache servers may be even faster.)
Moodle currently basically works if you use local storage for most of the cache but there are two problems:
- When the cache is flushed, e.g. on upgrade, the webserver running the upgrade doesn't know how to flush the cache on all the other servers. I.e. there ought to be some way to specify that a cache is local, and provide a web service on all the machines, and have all the addresses in config, so that it can clear the cache across everything.
- I have a feeling there may be a need for extra metadata in terms of which cache systems can deliver what. For example, most of the cached data is ephemeral meaning that it can safely be stored in RAM and the system will build it up again on next request. But there may be some that isn't - either because it is actually critical data or (perhaps more likely) because it takes a very long time to calculate. Additionally, with regard to the above point, there might be a need to have certain cache backends (local caching) which only support 'flush all' type unsynchronized operations rather than being able to reliably clear a specified key across all systems. (Most cached data should be utilised in a way so that this will be good enough, because otherwise that imposes a high performance cost.)
So to summarise, for a highly scalable system you need:
- Most cached information should be stored on the web server hardware - this can scale (nearly) infinitely because you just put in more web servers if you need, and is fast because all the information is already where it's needed.
- Only cached information that absolutely needs to be shared, such as user session information or information that is exceedingly slow to calculate, should be stored on a shared cache instance. This reduces performance because of the need to wait for network traffic and could be a bottleneck, so this type of cache should only be used where fundamentally necessary.
There ought to be a way to distinguish the types of information so that you can set up a system and have it automatically do the right thing with the highest performance depending on the available caches you've set up. It kind of looks like the infrastructure for this is nearly in place but with some missing pieces.
One other note - possibly the busiest cache is the lang string cache and this caches information which in some installations (which don't use foreign languages) can be derived entirely from local disk (PHP files in the codebase). In fundamental terms, caching this on shared storage would be expected to cause a performance decrease - although PHP code interpreting is currently so slow, that isn't a given.
I mostly agree with your notes. Thanks for clarifying about mdl_config and mdl_config plugins.
there is currently no good way to support local caches in a system with multiple webservers
AIUI, there's an APC cache plugin hiding somewhere (in a Catalyst repo?). I haven't seen it myself. Should not be hard to implement if it doesn't exist.
I also intend to test using the APC cache plugin (if I can find it / implement it) against local files, after fixing a couple of bugs in the local files implementation and putting the local files in a ramdisk.
Only cached information that absolutely needs to be shared (...) should be stored in a shared cache instance
Yes. The data in Moodle has different characteristics. Whether it must be in sync across nodes (or minor short lived differences are tolerable), the invalidation mechanism (TTL / generation id / LRU ... ), the size of objects to be cached, the frequency of use, etc.
Because of this, as I am looking at the MUC code I am not sure about the "unified" part.
IMHO, a complex system like Moodle needs several different cache facilities.
For example APC and eaccelerator have low-latency local caches for tiny serialized objects with TTLs. It makes sense to give it an arbitrary name ("blue cache") and a small API dealing explicitly with this case... and make "blue cache plugins" implementing it with APC, eaccelerator and files-in-ramdisk.
Cluster-synchronized caches for small objects can be another cache type, and a cache API modelled after memcached. It may be able to reuse the APC/eaccelerator caches for single-webserver configs.
(Here I'll note that in my testing memcached used locally is slower than APC/eaccelerator caches. mmap() wins over sockets.
Cluster-synchronized caches that can handle larger content (mongoDB?) are yet another case.
And essentially callers in moodle code should call into the appropriate cache type. This would hopefully simplify configuration and generally allow us to match the task to the most appropriate cache "type" at coding time...
great post! !
Hoping that Sam-The-MUC-Master will have the chance to read it via this post in the Community or via a pointer in the Dev Chat: it contains great food for thoughts to enh MUC by means of adding a new de-centralized (and limited in terms of responsibility as per your post) cache module.
New MUC feature for 2.6? If I could vote: +1 .
thank you for your insights, I agree with you about the issue of cache on multiple web server setup, we are testing multiple web server with local memcache to avoid single point of failure and network traffic between web servers and memcache server. However, the problem of sessions as well as cache invalidation when coming to updates is something to deal with.
Language strings caching is another aspect that could be of great advantage to have in a shared cache server, especially when you have several Moodle instances, however with thousands of strings, especially on a multi language environment, there are more cons and we too measured a decrease in performance.
Moodle 2.6 could be a good release where to summarize many useful improvements to the caching architecture.
Speaking of those callgraphs...
This was on a site with a default install (guest and admin user only) and a single course with nothing extra added to it (caution, links are to VERY large XHProf callgraph PNGs):