Looking into containerized installation.
Assuming AWS Fargate/ECS containers what kind of horsepower can be
expected (assuming the container is the bottleneck) for a typical 1 vCPU/2GB container ?
What would be reasonable number of online users a container will be able to support
assuming typical usage for Moodle site ?
Qualifications for this response ... not an Amazon Consultant/Engineer, etc. but .... do admin moodles on Google Compute Engine, Rackspace, under VMWare, under Azure (yuck!) and a couple of physical/standalone Linux boxen.
Don't know that anyone could define ... 'reasonable number of online users' - what do you guess?
Also 'typical usage' ... is probably determined by what entity uses it and how they use it ... so what industry is in mind?
Video/Audio's, all SCORM, etc. etc.
Devil is in the details.
'spirit of sharing', Ken
I have 500 sites... can you handle this with a single image ?
Are you looking into launching 500 instances ?
Might I suggest checking out:
Dated Feb. 2017 - and about Idaho State University (ISU) running Moodle on Amazon ... multiple services.
You may as well try to talk to someone at ISU who is doing, has done, involved.
I'm not presently at a place launching 500+ instances, no, but I've been there before .
Are your 500 instances because you are reselling - so each is it's own customer, or because you have 500 different schools, etc?
Multi-tenancy is an interesting challenge, ideal solutions are heavily influenced by the reason you need it, for example we have the latter, with significant morning loads when all the schools sign in.
So you are planning 500 containers, each with it's own config file? Have you tried T2 unlimited?
I have many Moodle sites because we operate on a per school site, and we have xxx of those. We are looking for horizontally scale-able infrastructure. Yes, your morning load is a perfect example... we would like to have flexible number of running containers running off a cluster of machines and the ability to adjust the number of machines and containers dynamically per the load at any point in time. T2 unlimited is just a variant of T2 instance... we do not care about the type of instances running the cluster, we care about the efficiency, we would like to payjust for the resources we consume.
There is no "official" Docker installation for Moodle
So, like all other variations on one-click installers - it's all going to depend on the decisions made by the person creating the installer. How they have optimised the various components (or not).
I don't really understand the advantages of Docker - nobody has ever managed to explain to me why it's a good thing - but if I where responsible for 500+ Moodle sites, I would write my own maintenance scripts so that I understood exactly what they did.
There is no need for "official" Docker installation - Docker is a tool for packaging and shipping. So for whatever "official" installation there is, it can be executed inside a docker container. In order to understand the added value of containers please see https://www.docker.com/what-container.
Sure we will take care of scripting our environment, however the problem I am addressing here is performance - I do not like unused resources to be "hanging" there I just want the required horsepower I need to handle the current load to be provisioned and then when the load fluctuates allocate more resources as needed. This can be addressed via a cluster of processors running containers, if the system becomes crouded you can add processors and containers on the fly to accommodate the load or balance it over the cluster.
If this was a single site then no problem, you can easily achieve this goal, however when
there are many sites involved with current architecture of Moodle it is not possible to
share resources which leads to inefficient cluster. To overcome this, the solution could be
to have a single instance of Moodle (code) which is able to handle any of the Moodle sites - this is called Multi-Tenant structure - so whenever call comes in it is being routed to idle container which will be able to serve it.
This is the purpose of this thread... to find out with the Guru Moodlers how we can achieve a stateless mode of operation so that all containers in the cluster will be
able to serve any call.
Hope this clears the subject.
Not that I claim to have the know-how. But I think even if I had, I don't really understand what you want to achieve. Let me repeat what I have understood:
a. You want to be able to run a large number (500) of Moodle instances for different customers on the same Moodle code base.
b. They share the infrastructure (virtual hardware and the software stack) so that individual Moodle instances can take high loads as long as the sum total is within a certain limit.
c. Once the total load starts hitting the limit, the infrastructure is able to expand itself (in cloud style).
Am I right? If so, this reminds me the MoodleCloud https://moodlecloud.com/. I don't know how it is implemented. But to my understanding the multi-tenant path and the "cloud" paths (virtualization, containarization, ...) are mutually exclusive.
BTW, Howard was just being modest, he understands something about multi-tenancy.
You are spot on!!!
Not sure why virtualization and multi-tenancy should be conflicting.
It all boils down to the question whether the Moodle stack is stateless.
If it is, then there should be no problem... each request can be handled by any of the processor, if it is false and there is affiliation between a processor and a request, then all you can do is load balance per tenant, you cannot load balance the whole cluster.
Being stateless incurs costs... (of restoring the state between calls) however this
might be worth to achieve the optimization of the whole cluster.
Now to your questions:
> Not sure why virtualization and multi-tenancy should be conflicting.
Neither do I. My browser tracked what I have been writing and proposed https://www.computerworld.com/article/2517005/data-center/multi-tenancy-in-the-cloud--why-it-matters.html. (I know, it is from 2010. Which is positive, there must be a follow up.)
> It all boils down to the question whether the Moodle stack is stateless.
> If it is, then there should be no problem...
Are you not familiar with Moodle? Simplified, its code, the $moodle directory, is read-only for the web-server. But it needs write permissions on a second directory which we call $moodledata. This is unique for each Moodle instance. Moodle stores all sorts of things there, so delicate, and needs to be fast. And there is a third thing: where Moodle puts its session data. This can be configured, the parameter either in $moodle/config.php or in the database, decides whether the session is stored in $moodledata or in the database.
> each request can be handled by any of the processor, if it is false and there is affiliation between a processor and a request, then all you can do is load balance per tenant, you cannot load balance the whole cluster.
> Being stateless incurs costs... (of restoring the state between calls) however this might be worth to achieve the optimization of the whole cluster.
Sorry, no idea what you are talking about. The dockerized people seem to have a language of their own, just like the Amazon Elastic Band group. Same thing with your other post https://moodle.org/mod/forum/discuss.php?d=365847#p1475589. Well it has more problems. Do you write on a game console or something? What are the sentences? What are paragraphs, what are enumerations? Why the big space at the end? For the reply? I find the subject interesting, but the communication very difficult.
Sorry... I used Gmail for the reply... it has bad formatting.
I'm not an expert on Moodle, this why I am asking your help.
I am aware of the 4 pillars of Moodle: Data, Code,Session and SQL.
Let assume for the purpose of this discussion that all my customers are using
the same Moodle with same plugins, different is only in MoodleData and SQL
My question is, is it is practical to implement a wrapper that accept
pointers to Data, Session and SQL in the form
Response = Request($MoodleData, $Session, $SqlConnection)
If true, I can share the same container between all my customers,
otherwise I need a container per customer which is not efficient solution
when you have many of those.
Hope this is clearer now.
Response = Request($MoodleData, $Session, $SqlConnection)
yes, that could be possible with a dirty hack:
but you're missing here $MoodleCode since unless you'll offer the same thing to all of your Customers that should be a variable too, easily managed through the creation of packages of features, packaged into different docker images.
Thanks for the pointers... looks super easy .
I have omitted the $code intentionally to simplify the discussion.
BTW: can you tell if for WordPress it is the same ?
Am a customer of yours ... one of the 500 sites .... current Moodle version is say whatever you've setup initially - let's say that's 3.3 ... am customer #349 and I want to upgrade my site to 3.5. How are you going to provide (with one code base)?
As customer #349 I want to run autobackups and keep all backups. Possible?
There's another customer (#150) that want's to use SAML2 authentication and yet another (#400) that wants to use Oauth2 authentication - one code base. Possible?
Are the databases per site unique?
Are the data directories per site unique?
'spirit of sharing', Ken
Moodle Code folder
Moodle Data folder
Connection String to central SQL database
On every call the Reverse proxy facing the call. identifies the tenant
(based on hers URL) - and fetches from a central configuration table for
that customer a pointer
to her code folder, a pointer to her MoodleData and a connection string to
the central SQL database then routes the request to the
container (or process) that handles the call.
So any configuration/customization which is "SQL" based - is resolved (per
the connection string)
Any private data is resolved (per MoodleData)
Code can be private per customer (no big deal) or event clustered in a way
that all customers running the same version are pointed to the same folder.
On Sun, Feb 18, 2018 at 4:00 PM Ken Task (via Moodle.org) <
So it's not one code base but one code base per customer - one data dir per customer. Hmmmm ...
Maybe what you need to do is setup a 'proof of concept' ... not 500 but 20? Then try to imagine how those 20 'customers' would differ.
The 'customer' who wants un-limited autobackups, keep all, for example, had better have a Amazon or Google bucket cause a 2TB data dir won't be enough in a short time frame.
Or the customer who has their own internal ID management system and wants the 'cloud based' moodle site to use that IDM - very securely (to the point of paranoia).
One might need to hire RackSpace techs to take on 'customer support'!
'spirit of sharing', Ken
No worries... we are in the working... current design is having an image per customer which is fine... however bad in terms of consuming resources as you cannot load balance the whole cluster. The missing link is "code sharing" which requires us to run a container per customer.... we cannot share containers between customers.... this is why I'm here.....
This is not an open service to the public, it is controlled environment of our customers who we are managing their Moddles, so the control is there and all AWS resources are attached.
I'm not 100% sure I have understood your use case but I suspect you are looking for Moodle to work in the way you would like rather than the way it does.
If you have 500 Moodle's then you have 500 databases and 500 'moodledata' directories. If these are all doing the same course (or even if some of them are) it's going to be dreadfully inefficient. Working around this is fighting against some fundamentals of Moodle's design.
Sure... I have 500 databases, and have 500 moodledata's directories....
All the 500 databases are implemented as single SQL server (RDS) - which is scalable
All the 500 moodledata's are folder on a single NFS/EFS - which is scalable
They have nothing in common..every one of them is doing its own courses, with its own teachers with its own students...
The only thing which is common is same version of Moodle/Plugins.... and they are all my customers.
In which case, what's the question?
I'm clearly failing to get to the bottom of what you want to know.
Is your question that you don't want to pay for 500 instances of all this when (for example) on 20 particular ones are in use at a particular time? A perfectly reasonable question in that case.
Exactly .... do not want to allocate an instance per customer... I want a "bank" of instances each able to server any request.
What about 50K users online ? Will you be able to handle all the coming load ?
What about 5000 instances ?
The point this is not scale-able.... these days are over... we are in the cloud era...
The cloud is just somebody else's computers enshrined in a whole bunch of marketing.
I am in the happy position of knowing how many users I have. You, clearly, are not. It sounds like fun. A cloud solution may well be your answer but I wouldn't get obsessed by it. I'm just old, I've seen the next great thing several times over. Sometimes it is, often it isn't.
Wrong..... very wrong.... me also not so young .... (I have been programming the 8080/z80) and have been through some computing generations.
The Cloud is totally different paradigm... which was not possible before. It could serve as yesterday's old-fashion machines on steroids however, this is just for the beginners... go to the Cloud providers and see what kind of services there out there... Look at Server-less technology.... Life is just beginning..... (again)
Unless I am missing something, so what if you have one installation or 500? Each process is a web server thread. Even if you wrap this in fancy "cloud speak" the principal is the same. One actual user doing a *thing* creates a thread and this consumes resources.
If all you need is multi-tenancy then standard Moodle doesn't do that. Have a look at www.iomad.org (which is Moodle plus multi-tenancy)
"...This can be addressed via a cluster of processors running containers, if the system becomes crouded you can add processors and containers on the fly to accommodate the load or balance it over the cluster."
You are presumably describing some witchcraft on your chosen host. I don't know anything about this but it doesn't sound like a Moodle issue per se,
I'm aware that standard Moodle has no support for multi-tenancy, and this is actually what I'm looking to achieve.
Did not say there is a Moodle issue.
I'm pretty sure we can scale per customer's Moodle, however we are looking to scale per ALL our customers... which means to utilize ALL allocated resources and not have idle processors sitting there waiting for their customers to come to work while other processors are stressed taking care of those customers that are working.
To summarize, we do not want affiliation between customers and processors be it containers or vm's or processes ... all the processors must be identical and able to serve each coming request, no matter which school originated the request - this is the key to efficiency and scalability of the cluster.
I'm struggling to understand why 'processors' would be sitting idle. Maybe you and I mean something different by 'processors'.
When the web server is called upon to do something in Moodle it creates a process. If nobody is using it - no process. I know it's not *quite* as simple as that but close enough. What am I missing?
If you had all this lot sitting on a single (old fashioned?) linux box then it would all be easy. Unused Moodle instances 'cost' you nothing. Are you sure that you're not making this more complicated than it needs to be? Again, I'm happy to be corrected if I'm missing something fundamental.
I'm the first to admit that I simply don't understand containerisation. However, I think it's solving a problem I don't have.
It was an idiom... in containers terms it means there is a unit of work that consume resources, which is ready to serve requests (you pay for it even if it doing nothing... like you pay for a VM instance which is not getting requests). A container is like a VM inside a VM, it shares the resources of the hosting OS, however isolates the application totally from other applications on the same box).
True, the old fashioned box is fine but is not scale-able, and it is not resilient. To achieve scale-ability and resilience you need a cluster of machines so that you can spread the load among them. Containers allow you fine grain this process farther into the machines them selves. So if a container on instance A is broken, you can route the request to another container either on that machine or on another instance. So you gain more efficiency in resource consumption, as well as higher resilience on top of other goodies related to managing the configuration.
So containerization will be more efficient (lessto operate) and will address resilience in a way the old-fashion boxes cannot.
Ok I (finally) understand your issue. Unfortunately, I can't help. You are working in a world of which I have zero experience.
How do other applications achieve this? How do they automagically switch on/off a container as required?
I'm wondering if this is even a Moodle "thing"... as opposed to a PHP or even web server "thing". "Thing" can probably be defined more accurately!
it is called orchestration. Some refs in the containers' world:
BTW, @Nahum, Howard is right, the discussion should cover how to dockerize a PHP webapp, regardless some key points which are requirements for clustering - i.e. state management -: a PHP web app consists of static files and PHP files so you could consider a unit the sum of web server and PHP interpreter or consider PHP the main bottleneck to be scaled e.g. a web server to serve several domain under a reverse proxy to manage each domain and several PHP interpreters like PHP-FPM, balanced for the PHP workload.
Moodle is stateful thanks to the so called dataroot which contains both data, cache and session (by default): you need to start thinking at how you want to expose PHP files and the other ones since dataroot will require a shared storage for each domain (== tenant).
Besides the more you'll share about what would be the Orchestrator in your infrastructure the more helpful will be our contributions - hopefully.
Thanks for jumping in....
I think that dataroot is not an issue: We can map a drive of the container to our NFS root folder. Assuming some parameter ($moodledata) which the reverse proxy will append to the request the Moodle code can map itself off that root.
Sqlconnection should be easy as well - instead of loading host /database/user/password from some php config file, just use the value passed by the reverse proxy.
I am not sure about browser's state management, however a cookie can be passed in as well....
Is there concurrency issue ? Do we need to sequence the calls for a tenant ?
What else is there that stops us?
Is there anything else that Moodle/PHP/APPache caches ?
I think that dataroot is not an issue: We can map a drive of the container to our NFS root folder
Yes, that's the reason why I told you that dataroot is not an issue here.
For your questions read my comment in https://moodle.org/mod/forum/discuss.php?d=365847#p1475629 and be more confident with:
- https://docs.moodle.org/34/en/Performance_recommendations, including https://docs.moodle.org/34/en/Performance_recommendations#X-Sendfile
Than you'll have all the bits to define your way of managing multiple Moodle instances in your infrastructure, including deciding how to build it using containers for the "web" workload.
Not sure... if you have intimacy with Moodle....
Managing the containers is not the issue.. there are very powerful tools out there to orchestrate the operation of containers.
In our case, every unit of work is an Apache server in front of PHP code (Moodle) which lives inside of the container. So assume this request is a normal Moodle's Get/Post request however it carries with it additional parameters which were appended there by the reverse proxy which fronts the container. This information will include an SQL connection string (specific to the customer), the name of folder (MoodleData) of that customer, and a session information (Cookie?) sent by the browser.
The challenge is to "override" any Moodle's native initialization code and use the passed in values.... all the rest should be transparent.
Seems to me this will be a breeze for you
Moodle? I heard of it once.
I think your issue is that this is not a nice, clean MVC style application. There are hundreds, if not thousands of entry points. I'm not sure if this is a problem or not. But I rather suspect it is. Even if you pass 'client==acme' or whatever through to Moodle there's no simple way (that I can think of - and I've thought about this a lot) of doing something useful with it.
Did you look at www.iomad.org? The core changes for multi-tenancy have been made. I think it's a simpler starting point. Maybe...
But I'm completely out of my depth.
Perhaps having many of entry points is not an issue if there is a pattern we can automate and follow....
I will defiantly take a look at iomad.
Wondering if there is any updates to your findings ?
I'm going down the similar path too, am finding ways to optimize our hosting of multiple moodle tenants
Yes, I was able to set up a POC of multi-tenant Moodle, using central NFS and SQL
servers off a cluster running containers (behind a reverse proxy tier).
Unfortunately, at this stage each tenant runs from it's own image - this still needs to
be researched. As said this is just a POC we have not gone through scaling it up
beyond 2 servers and few tenants. Tnis project is currently on hold, to be resumed
in few month due to other priorities of ours.
If you would like to further discuss or join forces on this challenge you are welcome to
email me firstname.lastname@example.org
If I am not mistaken and when I worked for BT Lancashire ( Lancashire County Council), they had all this set up (which I changed), and it was originally set up by a Dan Poltawlski??
He might still be around, In fact he might actually work for Moodle
Dan has moved on from Moodle sadly. The setup we originally setup for CLEO before it was taken by BT/LCC was not dockerised. I don't think that Docker existed when we built that platform.
It was first setup in about 2006, and we ran the sites on bare metal. If memory serves, when we handed over to BT, it was running a set of five high-thread frontends, a pair of MySQLi DB servers running multi-master, and a Hot-spare NFS backend.
The service was providing Moodle 1.9 (Moodle 2.0 had only recently come out when the transfer process began).
When we handed it over it was still bare metal, with no virtualisation.
Out of interest, what kind of changes did you end up making to it?