Hi Dan,
Thanks for your comments!
I understand there's a different emphasis here. Basically, my characterisation of the different focuses would be:
- I want to make it feasible to use Moodle (at all) in a large system without infinite hardware budget. (This applies to our existing main system in that it's running fine now but we'd like it to continue to work as usage increases; there are also some potential large systems in future where we are considering whether it is possible to use Moodle or not.)
- You want to make it feasible to use Moodle in a large system without turning off the existing log/statistics analysis features in Moodle.
- Martin wants to actually improve log information for teachers (and others) within Moodle.
Obviously, I think mine is the most important requirement. But I think it's also true that the system I'm proposing actually helps with all requirements:
- Everything keeps working if you don't change settings. Even if you do want to use more efficient logging, nearly everything will keep working, assuming plugins are developed to support some level of read access (and the new file plugin I've proposed for core would be).
- Regarding the 'nearly' everything, i.e. where I had a bit of a cop-out regarding recent activities and some of the other items in the 'Logging 2' page which I'm suggesting will become unavailable if you choose to use file logging, this system provides a way forward by clearly identifying those areas of code, which others can then take forward as independent development steps.
- The 'Logging 2' proposals include moving some things out of the log table, which my proposal supports as above, and also adding more logging, which my proposal (as the plugin element listed on the 'Logging 2' page) makes feasible.
Basically the I think this proposed development, which is reasonably restricted in scope (it's actually got wider scope than I'm really happy with or than I expected initially), can be done independently without causing any problems for current Moodle users (who can continue to use database logging if necessary) and while enabling the large-scale use I'm after. As we all know, it's better to divide problems into small independent developments (that you can complete and release in entirety without breaking anything). I think this is one of those.
About your specific issue, my proposal also allows institutions who don't want to use the in-Moodle log analysis and statistics to write a plugin that doesn't support them. Which is something we would want here for any plugins we might write. You're right that here, we have no intention of using logs or statistics within Moodle on our systems. It's not directly related to this new development, but I can explain why.
Why we don't use Moodle statistics
Back in 1.x days, we turned off Moodle statistics because generating them entirely killed the system (I can't remember if it ran out of memory and failed, or just monopolised the database for pretty much the entire day when calculating the previous day's stats, but either way, it was no use to anyone). In addition, the information from statistics wasn't organised in the way we wanted. I can't remember the exact detail but we wanted different data in order to support the people running/designing courses.
So anyway, I wrote an entirely new statistics analyser that was designed for performance. It did a similar thing to Moodle statistics but much faster, without running out of memory, and generating more useful (for our purposes) numbers. There was also a nice interface with pretty graphs and whatever.
We don't use this system in 2.x. Here's why: as usage built in our 1.x system, even this new high-performance stats calculation, when accessing the ginormous log tables, started to take too long (causing performance problems for other areas while it was running). Then it started to run out of memory and fail. Toward the end, we had to turn that system off too, leaving users without any statistics.
What are we doing in 2.x? Shipping mdl_log data out of our database and passing it to an external system. There are two reasons for doing this:
- Performance: The minimum of work (reading the new data out of the database once a day) takes place on our live infrastructure. We're doing all the calculation, analysis, and reporting on a separate system. Even if it goes wrong or takes forever, failures of that other system cannot affect students.
- Consistency: Other OU systems also provide data to the same external system, which (at least theoretically) allows us to answer data-mining type questions such as 'did people who registered late for the course also visit the website less regularly than those who registered in plenty of time?' (Moodle doesn't have the first part of that data, so it wouldn't be possible to get that from Moodle data alone.) I'm not sure that particular question is any use, but you get the point.
Both of these are a good thing in my view, and I don't think Moodle - for large institutions - should prevent people from doing this. Yes it would be nice if Moodle statistics were fast enough to run on large systems (and we didn't actually try them in 2.x, maybe they are better now). I think some of the other proposed Moodle changes, like separating cron up so that things can run in parallel, might allow this type of thing. But I think people who don't want to use Moodle statistics should be allowed not to.
Why we don't use Moodle log display
We do use Moodle logs, mainly to investigate problems, but we generally do so by manual database access (so we can do custom queries) for system administrators; or by other custom queries for specific reporting (using the 'Custom SQL' report plugin).
So far as possible we don't use Moodle log display for non-admin users; we've basically got it turned off in nearly all cases.
There are really two reasons for that:
- Performance worries. In 1.x we were concerned about almost any access to the log table because it is so huge and there were performance issues. I don't think this is actually a problem nowadays; we've got the database infrastructure working (using Postgres features, the log table is split into new tables each month which are combined in some kind of view, and we can delete the tables older than the previous month) and Moodle log views use the indexes, so it works OK.
- Data protection. Moodle logs tend to give information that you don't really need, such as student IP addresses, and unless you're really careful with permissions they tend to give access to people who don't really need it. This is both a legal concern, but also a general good practice concern - i.e. we don't want tutors examining everything students do in minute detail, because that's a waste of their time plus students might not like it.
Hopefully that explains where I'm coming from. But regarding this specific proposal, my position is that basically, what I've proposed is a good independent first step torward what everybody wants. Which is why I think you should all approve it. ;)
--sam