Observability / monitoring

Observability / monitoring

by Matt Wallace -
Number of replies: 6

Hi all,

I'm in the process of setting up Moodle as our learning platform for our makerspace and I'm struggling to find anything in the docs about how you monitor the platform.

I've found https://docs.moodle.org/dev/Status_API but that seems to talk only about Nagios and Icinga, which haven't really been adopted by newer organisations for over 7 years now.

Are there any plans to implement something like Open Telemetry (https://opentelemetry.io/docs/instrumentation/php/) or at the very least expose performance metrics via a Prometheus-compatible endpoint?

Ideally I'd want parity with my other applications where I can see full application traces around how long functions have taken to run, how various parts of the system link together, and how much of a given request is held up by database queries etc.

I don't know PHP anymore (the last time I used it was rolling out PHP 4.3 to hundreds of thousands of installs on a shared hosting platform), but I do know Observability and Monitoring like the back of my hand and would be willing to contribute advice around that area if that is of use?

Thanks in advance for your replies!

Average of ratings: -
In reply to Matt Wallace

Re: Observability / monitoring

by Martin Božič -
Hi!

If you take a look at the issue mentioned right at the beginning of https://docs.moodle.org/dev/Status_API you'll see there is a lot of conversation about Prometheus, but not much has been made since 3 years ago.

We do use Prometheus for monitoring our infrastructure and parts of Moodle application in conjunction with ELK but since we've just recently started on implementing a plugin for Sentry that should cover observability of the application behaviour: https://github.com/1katoda/moodle-local_sentry - not production ready!
In reply to Martin Božič

Re: Observability / monitoring

by Matt Wallace -
Thanks for the heads-up, that does make sense, and it's a shame that not more progress has been made.

I know (and love!) Sentry, and it's great for application failures, however implementing something like OpenTelemetry  would immediately make Moodle compatible with Elastic, AppDynamics, NewRelic, DataDog, Splunk, Grafana (full disclosure, I started working for Grafana just over three weeks ago!), Honeycomb, and just about every other Observability/Monitoring provider out there.

It would also allow for the provision of metrics such as "how quickly is that function call responding?" and "how many users are logged in?", logs like you are already using, and application traces ("When a user logs in, what are all the various parts of the code/database that are called/queried, and how long does each step take? Now show me where the bottlenecks are"), effectively giving you full end-to-end insight into what your Moodle installation was doing, why it was doing that, and how quickly it was doing it.

The other nice thing that OpenTelemetry gives you is insights into interoperability between platforms.  In our case this means that an attempt to log in to Moodle from our OAuth2-based membership server would be shown as a trace across both platforms, allowing us to see any errors in communications between the two!

The entire industry appears to be moving towards OpenTelemetry as the standard for monitoring/observability, so it may be worth focusing efforts there rather than implementing for a specific vendor and having to refactor in future?

As I said in my original post, unfortunately I don't have the skills to code the PHP, but I certainly have the knowledge of the observability side of things to support others if they have the desire to implement it, and would be keen to see it in place!
Average of ratings: Useful (1)
In reply to Matt Wallace

Re: Observability / monitoring

by Job Céspedes Ortiz -
Picture of Plugin developers
I would say that Prometheus is the preferred tool for monitoring and for metrics when scaling. I have not heard of Prometheus metrics inside Moodle code yet. Of course, you can already obtain resource usage metrics of the underlying infrastructure. This does not depend on Moodle code exporting metrics, but your environment and other exporters. Alternatively, you could develop a moodle exporter with custom metrics. By the way, there is a Moodle Prometheus exporter by SysBind.

I concur with the original poster (OP); OpenTelemetry seems, at the moment, to be a helpful integration for traces at all levels, including the application, especially considering the current state of the observability landscape. There are currently two ways to implement it, in my opinion. One is at the Moodle code level (using an SDK), and the other is at the PHP level (using a PHP extension or agent). The former could be rich in functionalities, while the latter could be simpler and more transparent for Moodle.

An open-source alternative, similar to the second option, is Apache SkyWalking. While it does not enable OpenTelemetry from Moodle, you could install and enable its PHP agent and start using SkyWalking as an Application Performance Monitor for metrics, tracing and logging. Please note that you have to host an Apache Skywalking server first.

Moodle instances in our managed service  rely on Prometheus for metrics and alerts, Loki for logging, and Grafana for visualizations. As none of these tools currently have integration at the Moodle code level, observation and monitoring primarily extend to the layers beneath the Moodle level, ranging from PHP to the infrastructure resources. We are contemplating the integration of Apache SkyWalking for tracing purposes.

In reply to Job Céspedes Ortiz

Re: Observability / monitoring

by Matt Wallace -

An open-source alternative, similar to the second option, is Apache SkyWalking. While it does not enable OpenTelemetry from Moodle, you could install and enable its PHP agent and start using SkyWalking as an Application Performance Monitor for metrics, tracing and logging. Please note that you have to host an Apache Skywalking server first.

SkyWalking looks like an excellent platform, but I'd argue that if you're going to host an Open Source solution then you probably want to be looking at Grafana or Elastic as you're more likely to find the skills required in the wider industry for your team.

OpenTelemetry (OTEL) also means that you can start off with something like NewRelic, DataDog, or Grafana Cloud (all of which offer a "free tier") and then if you want to migrate to another provider, it's as easy as updating the config in your code/agent to point to the new OTEL end point and everything will just switch over seamlessly.  You can even run multiple providers side-by-side whilst you migrate if that makes sense to do so.

Prometheus metrics have been, and continue to be, the current standard for most platform-level metrics and some application ones, but as OTEL matures I suspect that the Prometheus format will remain but the OTEL Transport layers will take over.
In reply to Matt Wallace

Re: Observability / monitoring

by Job Céspedes Ortiz -
Picture of Plugin developers
Elastic has great products. Let me point out one aspect related to their licensing: Elastic does not refer to Kibana and Elasticsearch as open source since the 2021 license change.
In reply to Job Céspedes Ortiz

Re: Observability / monitoring

by Matt Wallace -
The "community edition" of Elastic products is good, I have some fairly significant reservations around their cloud offering based on experiences with major finance organisations as part of my day job, but apart from the licensing it's fine.

One of the reasons I'm recommending OpenTelemetry is that it doesn't matter what the backend is.

If you really wanted to (although I've no idea why you would!), you could use OpenTelemetry to send your metrics to all three of Elastic, Datadog, and AppDynamics, your logs to Splunk and Grafana Cloud, and application traces to NewRelic and Honeycomb.io, and you'd only need one library in the codebase and you wouldn't need to alter any of the formatting of the data.

OpenTelemetry gives far more flexibility as to where people send their observability data, is truly OpenSource, and is a key project of the Cloud Native Computing Foundation rather than being owned by a specific vendor, so it really is as open as it gets!
Average of ratings: Useful (1)