The Moodle HQ Back-end team is working on a new approach to logging. We have been busy redesigning the Events system to accommodate this and now we have a draft specification for Logging ready for public comment.
Your feedback is welcomed.
To add comments, questions and suggestions, please respond to this forum post.
Jokes aside, there is one thing that I don't see discussed there, and I think is quite central: differentiated handling of old logs vs recent logs (vs very old logs?).
We want to write them out fast and low-overhead. Whatever this storage is, it won't be query-able, or will have restrictions.
So at least we should say: our read/query log API is not guaranteed to see the very latest entries. In fact, you probably can't see the last 5m of activity - deal with it.
Anything that wants to see the very latest entries needs perhaps a different approach. Perhaps the logging layer has some pre-cooked hardcodes bits that are tailored to run fast (keep a list of recently seen usernames, keep a tally of pages loaded in the last N minutes), or modules could register a callback.
Either way, these are hot paths. A bit of, ahem, not entirely thought through code here can throw the handbrake on. Big time.
From a scalability perspective, mdl_log is a big bottleneck. Ot makes sense to log somewhere else, somewhere where we don't have to contend for a lock, no need to maintain indexes, etc.
Options include logging to a file, logging to in-memory tables (all our RDBMSs have some support), splitting the logging into several files or tables (to reduce contention), etc.
All of these options are supplemented by a cronjob or daemon that feeds the data to a database table (where it gets all the benefits of indexes, etc) in a way that is more DB-friendly.
The data in that short-term pool isn't easily query-able. If we put demands on it being readable, then we paint ourselves into a corner...
I agree with what you say. The API we are proposing should be suitable for any mechanism of log storage because the reading and writing can be fully independent. Nothing with the exception of reports should be reading the data from log storages, that should imho help when dealing with any delay between writing and reading.
Nothing with the exception of reports should be reading the data from log storages
Well, that's the easy case. But we have two cases I know off the top of my head that make this a bit more interesting.
- "Live logs", which is only useful if it can read the recent logs, so it will need some form of API, or get axed. And I do think it is useful.
- Recent activity block. Also useful, can perhaps cope with a short delay.
i echo the concern with performance.
also, maybe i just missed it, but i'm wondering about how the existing logging system will co-exist with the new one. i did see the note about doubling the hits on the db if both are enabled, but nothing about how the code and admin interfaces will be set up. and maybe that will happen a bit later ...
I may be showing my Java background here, but...
...why not use an existing Enterprise grade logging frame work?
assuming it exists of course
A large part of what is being proposed seems to be present already in a frame work like log4php.
I would expect a logging API to use an existing robust framework, or at the least enable easy tie in of the logging framework of choice.
So in the new system, piece of software that will want to access logs, will need to query logging retrieval plugin. This plugin will implement interface, so retrieving logs will be the same, no matter where they are actually stored.
Do you have some ideas how actual methods for querying could look like? I'm a bit worried that compared to SQL, they will be very limited.
There is some info on that here http://docs.moodle.org/dev/Logging_2#Public_API ... but I hope there can be more detail added by the BACKEND team working on the spec.
However, it's worth noting that there will be no need for most Moodle code to ever access logs this way. Moodle code that currently accesses or joins to the log table (eg Recent Activity) will usually listen to events directly and capture what it needs in the most efficient format.
I also expect any really heavy future analytics processing will take place externally to Moodle.
Just seen this new bit added to the spec:
- $selectwhere is supposed to be a limited sql predicate.
- The set of operators allowed in $selectwhere are :- =, <>, >, >=, <, <=, !=
- The set of keywords allowed in $selectwhere are :- IN, BETWEEN, LIKE, IS NULL, IS NOT NULL
- Use of anything besides the mentioned keywords/operators is strictly forbidden.
I assume that means something like a string "userid = 123 AND value > 3 AND time BETWEEN 1234567890 AND 1234567900".
Note you, have not said AND is an allowed operator! Also, I see no need or advantage in allowing both <> and !=. <> is the SQL standard operator. Let us insist on that.
Instead of invention some new thing (pseudo-SQL string that then each implementation has to parse) why not instead do something like we already do in $DB, e.g. the above would become
array('userid' => 123, 'value' => array('>', 3), 'time' = array('BETWEEN', 1234567890, 1234567900))
This would be nicer if we used the new PHP array syntax:
['userid' => 123, 'value' => ['>', 3], 'time' = ['BETWEEN', 1234567890, 1234567900]]
If we decided that this format is nice, we could even start using in in $DB methods like get_records as well as here.
Bump. This would make it much simpler to implement storage on top of something like elasicsearch, mongo or whatever without attempting to parse pesudo-SQL.
Maybe it is even possible to handle 'interesting' aggregations (groupings/count/avg) this way too and therefore avoid so much hassle making some of the complex reports work there.
It will not be acceptable to find 2.7 means half the reports work or half the DB performance are our only choices.
I guess we have to re-implement a bunch of core reports etc because logging2 doesn't actually achieve any of the original purpose...
Not sure if it is worth posting here. I did not get a response to the last question from any of the developers working on this.
However, I have another concern about this. MDL-41266 (being reviewed now) has been implemented using a trendy new PHP language feature called traits. There has already been concern expressed about this. E.g. https://tracker.moodle.org/browse/MDL-41266?focusedCommentId=271925&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-271925
So far the justification given has been
- This is the only way to implement this without duplicating code.
I agree that duplicating code is bad, and should be avoided, but people have been avoiding duplication in object-oriented code for years in many computer languages without using trendy language features like traits, so I would like to see a proper justification why traits are the only solution here, otherwise, the blind assertion from the developers just makes me think one of
- they are just having fund playing with a trendy new language features; or
- they just don't know the standard object-orented techniques (design patterns) that would like them do this without traits.
The traits have been discussed multiple times internally and also have been improved, since their inception. It is a simple and very useful concept, and we are using it as it was designed to be used. If you some objective arguments against it's use, please explain those, instead of patronising us. We would be happy to consider alternate solutions that you propose, in this regard.
If you follow the commits in MDL-41266, you can clearly notice the improvements in the traits. Abstract methods were added, renaming was done to achieve more clarity, etc. Again, if there are some objective arguments against this specific implementation of traits, please feel free to propose improvements. The argument that we should not use traits, just because they are new (new = 2 years old) PHP feature is pointless imho.
I agree that is a poor justification on its own, but it isn't a pointless discussion.
We are an open source community of hundreds of developers. There is a lot to be said to be conservative about the language features we adopt so that we have the best opportunity for our community of developers to be able to work with our code. The problem is especially exasberated given our documentation often lags behind the evolution of our code (just look on the hundreds of posts on the general developer forum and you will get a feel for it).
Another point: as you probably know, one of my main worries about the new Events / Logging API is the extreme verboseness of the code you need to write.
Yesterday, I was teaching a Moodle development course, and I used this example code to show people everything you need in a basic Moodle script: https://github.com/timhunt/moodle-local_greet/blob/master/index.php#L27
Would someone who understands the new Logging API like give me the minimal amount of code required to update line // 7. Thanks.
here is the documentation for add_to_log() migration: http://docs.moodle.org/dev/Migrating_logging_calls_in_plugins
By the way, please stop using PARAM_TEXT and full user name in GET parameters in any examples, it is confusing and often leads to serious security problems, instead pass around user id if necessary.
I agree that my existing example code was not perfect. However, it was an attempt to hilight most of the key points that you need to know about in more detail to write a Moodle script, and to do by way of an example that acutally ran.
Note that log everythign is an important rule in the Security guidelines: http://docs.moodle.org/dev/Security#Log_every_request. It would be wrong to leave it out of my example.
I still think that my request "show me the minimal possible working log code" is a good question to ask, even if it only really serves as a pointer to the things you need to go and learn about in more depth. It would be nice if someone could acutally write the code. It strikes me as a better use of your time than telling me I am asking the wrong question.
In current logging implementation, if a record could not be INSERT, there was an email triggered with
Subject: Insert into log failed at your moodle site ...
and content: Insert into log table failed at ..... It is possible that your disk is full. The failed SQL is: INSERT INTO mdl_log....
Is there something similar in the new implementation? That is - some fall-back when log message could not be stored?