Performance performance performance!
Jokes aside, there is one thing that I don't see discussed there, and I think is quite central: differentiated handling of old logs vs recent logs (vs very old logs?).
We want to write them out fast and low-overhead. Whatever this storage is, it won't be query-able, or will have restrictions.
So at least we should say: our read/query log API is not guaranteed to see the very latest entries. In fact, you probably can't see the last 5m of activity - deal with it.
Anything that wants to see the very latest entries needs perhaps a different approach. Perhaps the logging layer has some pre-cooked hardcodes bits that are tailored to run fast (keep a list of recently seen usernames, keep a tally of pages loaded in the last N minutes), or modules could register a callback.
Either way, these are hot paths. A bit of, ahem, not entirely thought through code here can throw the handbrake on. Big time.
In reply to Martín Langhoff
Re: Feedback requested on new Logging specification
by Martín Langhoff -
To add an example or two.
From a scalability perspective, mdl_log is a big bottleneck. Ot makes sense to log somewhere else, somewhere where we don't have to contend for a lock, no need to maintain indexes, etc.
Options include logging to a file, logging to in-memory tables (all our RDBMSs have some support), splitting the logging into several files or tables (to reduce contention), etc.
All of these options are supplemented by a cronjob or daemon that feeds the data to a database table (where it gets all the benefits of indexes, etc) in a way that is more DB-friendly.
The data in that short-term pool isn't easily query-able. If we put demands on it being readable, then we paint ourselves into a corner...
From a scalability perspective, mdl_log is a big bottleneck. Ot makes sense to log somewhere else, somewhere where we don't have to contend for a lock, no need to maintain indexes, etc.
Options include logging to a file, logging to in-memory tables (all our RDBMSs have some support), splitting the logging into several files or tables (to reduce contention), etc.
All of these options are supplemented by a cronjob or daemon that feeds the data to a database table (where it gets all the benefits of indexes, etc) in a way that is more DB-friendly.
The data in that short-term pool isn't easily query-able. If we put demands on it being readable, then we paint ourselves into a corner...
Hello!
I agree with what you say. The API we are proposing should be suitable for any mechanism of log storage because the reading and writing can be fully independent. Nothing with the exception of reports should be reading the data from log storages, that should imho help when dealing with any delay between writing and reading.
I agree with what you say. The API we are proposing should be suitable for any mechanism of log storage because the reading and writing can be fully independent. Nothing with the exception of reports should be reading the data from log storages, that should imho help when dealing with any delay between writing and reading.
Nothing with the exception of reports should be reading the data from log storages
Well, that's the easy case. But we have two cases I know off the top of my head that make this a bit more interesting.
- "Live logs", which is only useful if it can read the recent logs, so it will need some form of API, or get axed. And I do think it is useful.
- Recent activity block. Also useful, can perhaps cope with a short delay.
hi
i echo the concern with performance.
also, maybe i just missed it, but i'm wondering about how the existing logging system will co-exist with the new one. i did see the note about doubling the hits on the db if both are enabled, but nothing about how the code and admin interfaces will be set up. and maybe that will happen a bit later ...