Logging API (option to move logs out of database)

Logging API (option to move logs out of database)

by sam marshall -
Number of replies: 6
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Hi,

I'm proposing to implement a change to the logging API for 2.5 (or 2.6 if too late).

The summary of the change is that by default, behaviour will be identical to the present, but you'll be able to change it to log to a file (or do something else via a custom plugin) instead of to the mdl_log database table.

At the moment, logging to the database is a major performance headache when configuring large systems.

http://docs.moodle.org/dev/Logging_API_proposal

--sam

Average of ratings: Useful (1)
In reply to sam marshall

Re: Logging API (option to move logs out of database)

by Dan Poltawski -

I think that the problem statement at the start of the doc frames it perfectly and I got quite excited, because I thought you had the solution to all our problems. But, sadly it seems not quite... smile

Because there is no requirement for plugins to implement any of the ‘read’ features, it will be very easy to write custom plugins (for an institution that does not intend to use the standard Moodle log viewing features).

I wonder how much of a step forward this is? It seems to be me that logging/tracking is a fairly fundamental part of a learning management system and what is being proposed is to outsource this problem from Moodle completely. I understand the attractiveness of this from a system administration point of view, but I worry that we'll end up with 'toy' tracking features in Moodle which are never used by 'real' sites.

In the future, I think we should have a 'recent activity' system scales for the OU with 500,000 users and for a school with 20 students. Moodle core should be able to provide teachers with a way to see which resources are being used by their students, regardless of logging plugin installed. What I worry about with this is that we're moving away from this, not closer.

But, we don't have that at the moment. So maybe this is the right direction wink. At the hackfest in October, it was discussed to try and move everything out of the log table and to introduce specific tables needed for tracking the individual aspects of tracking. By moving towards this we turn the log table into the fire and forget logging, and encourages us to move us to the individual tables for specific reporting needs.

(As you might tell, i'm not quite sure myself)

Average of ratings: Useful (1)
In reply to Dan Poltawski

Re: Logging API (option to move logs out of database)

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers

This is the developing spec to look at and make suggestions on:  http://docs.moodle.org/dev/Logging_2

Most (all?) of the "realtime" needs that we currently use logging for can in fact be moved in new tables for that purpose, making them very fast.  That should free up the long-term logs to be huge and slow if they want to be.

In reply to Martin Dougiamas

Re: Logging API (option to move logs out of database)

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Thanks Martin! I looked but hadn't found that page, must have been using the wrong search keywords.

I think my proposed system fits OK with the 'Logging 2' thing - i.e. it's an incremental step in that direction that can be implemented without hurting anything. Basically my proposal is the 'MUC-like plugins to interface between our logging calls and log backends' part of that page. It also fits in with the default (database logging) given on that page.

--sam

In reply to Dan Poltawski

Re: Logging API (option to move logs out of database)

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Hi Dan,

Thanks for your comments!

I understand there's a different emphasis here. Basically, my characterisation of the different focuses would be:

  • I want to make it feasible to use Moodle (at all) in a large system without infinite hardware budget. (This applies to our existing main system in that it's running fine now but we'd like it to continue to work as usage increases; there are also some potential large systems in future where we are considering whether it is possible to use Moodle or not.)
  • You want to make it feasible to use Moodle in a large system without turning off the existing log/statistics analysis features in Moodle.
  • Martin wants to actually improve log information for teachers (and others) within Moodle.

Obviously, I think mine is the most important requirement. smile But I think it's also true that the system I'm proposing actually helps with all requirements:

  • Everything keeps working if you don't change settings. Even if you do want to use more efficient logging, nearly everything will keep working, assuming plugins are developed to support some level of read access (and the new file plugin I've proposed for core would be).
  • Regarding the 'nearly' everything, i.e. where I had a bit of a cop-out regarding recent activities and some of the other items in the 'Logging 2' page which I'm suggesting will become unavailable if you choose to use file logging, this system provides a way forward by clearly identifying those areas of code, which others can then take forward as independent development steps.
  • The 'Logging 2' proposals include moving some things out of the log table, which my proposal supports as above, and also adding more logging, which my proposal (as the plugin element listed on the 'Logging 2' page) makes feasible.

Basically the I think this proposed development, which is reasonably restricted in scope (it's actually got wider scope than I'm really happy with or than I expected initially), can be done independently without causing any problems for current Moodle users (who can continue to use database logging if necessary) and while enabling the large-scale use I'm after. As we all know, it's better to divide problems into small independent developments (that you can complete and release in entirety without breaking anything). I think this is one of those.

About your specific issue, my proposal also allows institutions who don't want to use the in-Moodle log analysis and statistics to write a plugin that doesn't support them. smile Which is something we would want here for any plugins we might write. You're right that here, we have no intention of using logs or statistics within Moodle on our systems. It's not directly related to this new development, but I can explain why.

Why we don't use Moodle statistics

Back in 1.x days, we turned off Moodle statistics because generating them entirely killed the system (I can't remember if it ran out of memory and failed, or just monopolised the database for pretty much the entire day when calculating the previous day's stats, but either way, it was no use to anyone). In addition, the information from statistics wasn't organised in the way we wanted. I can't remember the exact detail but we wanted different data in order to support the people running/designing courses.

So anyway, I wrote an entirely new statistics analyser that was designed for performance. It did a similar thing to Moodle statistics but much faster, without running out of memory, and generating more useful (for our purposes) numbers. There was also a nice interface with pretty graphs and whatever.

We don't use this system in 2.x. Here's why: as usage built in our 1.x system, even this new high-performance stats calculation, when accessing the ginormous log tables, started to take too long (causing performance problems for other areas while it was running). Then it started to run out of memory and fail. Toward the end, we had to turn that system off too, leaving users without any statistics.

What are we doing in 2.x? Shipping mdl_log data out of our database and passing it to an external system. There are two reasons for doing this:

  • Performance: The minimum of work (reading the new data out of the database once a day) takes place on our live infrastructure. We're doing all the calculation, analysis, and reporting on a separate system. Even if it goes wrong or takes forever, failures of that other system cannot affect students.
  • Consistency: Other OU systems also provide data to the same external system, which (at least theoretically) allows us to answer data-mining type questions such as 'did people who registered late for the course also visit the website less regularly than those who registered in plenty of time?' (Moodle doesn't have the first part of that data, so it wouldn't be possible to get that from Moodle data alone.) I'm not sure that particular question is any use, but you get the point.

Both of these are a good thing in my view, and I don't think Moodle - for large institutions - should prevent people from doing this. Yes it would be nice if Moodle statistics were fast enough to run on large systems (and we didn't actually try them in 2.x, maybe they are better now). I think some of the other proposed Moodle changes, like separating cron up so that things can run in parallel, might allow this type of thing. But I think people who don't want to use Moodle statistics should be allowed not to. smile

Why we don't use Moodle log display

We do use Moodle logs, mainly to investigate problems, but we generally do so by manual database access (so we can do custom queries) for system administrators; or by other custom queries for specific reporting (using the 'Custom SQL' report plugin).

So far as possible we don't use Moodle log display for non-admin users; we've basically got it turned off in nearly all cases.

There are really two reasons for that:

  • Performance worries. In 1.x we were concerned about almost any access to the log table because it is so huge and there were performance issues. I don't think this is actually a problem nowadays; we've got the database infrastructure working (using Postgres features, the log table is split into new tables each month which are combined in some kind of view, and we can delete the tables older than the previous month) and Moodle log views use the indexes, so it works OK.
  • Data protection. Moodle logs tend to give information that you don't really need, such as student IP addresses, and unless you're really careful with permissions they tend to give access to people who don't really need it. This is both a legal concern, but also a general good practice concern - i.e. we don't want tutors examining everything students do in minute detail, because that's a waste of their time plus students might not like it.

Hopefully that explains where I'm coming from. But regarding this specific proposal, my position is that basically, what I've proposed is a good independent first step torward what everybody wants. Which is why I think you should all approve it. ;)

--sam

Average of ratings: Useful (2)
In reply to sam marshall

Re: Logging API (option to move logs out of database)

by Aaron Barnes -

Hi guys,

One thing I'm noticing absent from this discussion is the use case for Moodle logs as an audit trail.

Expanding on logging in a well thought-out way could ultimately allow for precise audit logs (for data forensics, diagnostics or post-mortems) and perhaps even some form of generic "undo" functionality in the future.

Either way I see the idea of losing the ability to "easily" query logs as a step backwards - but maybe I'm too used to a non-education environment and this just isn't a priority for most users?

Thanks for your time,
Aaron

In reply to Aaron Barnes

Re: Logging API (option to move logs out of database)

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Moving the logs from DB to files, or another back-end just changes your query tool from SQL to grep, or something else.

I guess it depends on what sort of queries you need to run, and what you consider easy.