from M3.1 to M3.5 : database issue ?

from M3.1 to M3.5 : database issue ?

by Michel DENIS -
Number of replies: 18

Hello,

I dumped a database running on MySQL 5.7.18-0ubuntu0.16.04.1 on Moodle 3.1, and restored the dump on a Moodle 3.5 platform running 10.1.26-MariaDB-0+deb9u1. Of course I changed the config.php accordingly.

The M3.5 platform running the restored database is now running pretty well, except that when I disconnect for some hours and want to login again, often the client freezes/hangs while the mysqld process takes 100% CPU. After some minutes, some kind of timeout seems to allow me to login...

Do you think it could be due to compatibility isses between MySQL 5.7 and MariaDB 10.1 ? Should I have been doing some kind of upgrade of the database or else ?

Thank you in advance for your help,

-michel


Average of ratings: -
In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Moving to Performance and hardware forum...


So what is the database doing? Have you run 'show processlist' at the time? Do you have any kind of monitoring running (you should)?

In reply to Howard Miller

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Sorry for the delay but I had to move back to previous version due to a 35+ K users production environment. Unfortunately I cannot leave the problematic situation online nor an associated monitoring ..

So I temporarily put back online the problematic version and produced the list you suggested. Please see attached. That's a very strange output result to me !

Best regards,

-michel

Attachment processlist.JPG
In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Ken Task -
Picture of Particularly helpful Moodlers

Since you asked about anything 'special' ... no ... but did you forget to ... ?

https://mariadb.com/kb/en/library/upgrading-from-mysql-to-mariadb/

"You should run mysql_upgrade (just as you would with MySQL) to finish the upgrade. This is needed to ensure that your mysql privilege and event tables are updated with the new fields MariaDB uses. Note that if you use a MariaDB package, mysql_upgrade is usually run automatically."

BTW, doesn't hurt to run mysql_upgrade I don' t think ... even week or so after the move.   Think I've had to do that to resolve some 'slowness', etc..

But will defer to any true DB Admin with creds that jumps in here and corrects. smile

'spirit of sharing', Ken


In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Thank you Ken.

I installed mariaDB using standard apt install mariadb-client mariadb-server on a fresh environment, ie which did not have any previous version of mysql, and loaded a mysql dump coming from another platform, that's why I did not apply mysql_upgrade. Should I really ?

-michel

ps : I did another trial and again the mysqld process is running a similar request at 100% cpu and blocking the login process during a few minutes (see attached). Wondering what is the relationship between 'grades' and the login process here ?



Attachment Capture2.JPG
In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Ken Task -
Picture of Particularly helpful Moodlers

Ok, here's a dumb question ... is your DB on a dedicated DB server or is your setup using 'localhost'?

With 35K users would assume your Moodle is busy from time to time and would require a dedicated DB host to perform.

As far as mysql_upgrade ... like I said ... had issues with slowness on a server once and mysql_upgrade did fix it as I had forgotten to do that when upgrading (had migrated server at the same time so the DB server was new ... data came from sql dump from old server/old version).

'spirit of sharing', Ken


In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Moodle here is on a VPS with lots of RAM and CPU, but the DB is not on a dedicated DB server. Actually, on that new server there is currently about 0% utilization- the team are testing the Moodle 3.5 but no users yet.

OK will launch mysql_upgrade and see what happens ..

Thanks Ken; will let you know.

-michel

In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Hi Ken,

Just did a mysql_upgrade, and still same 100 issue with no way to login. See attache same message. Again that grade-oriented message.

Any idea what else I could do to debug ?

Thank you,

-michel

Attachment Capture3.JPG
In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Ken Task -
Picture of Particularly helpful Moodlers

Think Howard had suggested that something else was going on to cause the issue and suggested processlist ... maybe he was thinking something else still running or became a zombie process related to database.   Maybe Monday Howard can clarify.   In the meantime (am the 'weekend help') ...

Noticed that in each of the processlist you have provided that there was a different course ID.

Yes, if there is some heavy process in DB it would slow down or even prevent the login process from completing quickly ... example: from OS making a mysqldump (MySQL or MariaDB) without putting site in maintenance mode, would exhibit those symptoms.

So what's in your scheduled task list ...

There is a task for "Background processing for gradebook \core\task\grade_cron_task" and it's set to run ASAP in the 3.5.highest am looking at right now.

One could run that via the 'run now' link in the task list ... which would use apache service .... or one could run that same task via CLI ... /admin/tool/task/cli/.

Think I'd try running from CLI ...

php schedule_task.php --execute="\core\task\grade_cron_task"

*and* in another teminal session logged onto server in the mysql/mariadb client, run the processlist ... several times ... to see if that task ... is running through a series of courses.

Have you ever tuned the DB?

Think mysqltuner ... a perl script .. is compat with MariaDB.

You might discover DB server is in need of some tweaking ... the one I find most often in need of tweaking ... initially ... after things settle in ... less so.

InnoDB buffer pool / data size

And for reference: https://mariadb.com/kb/en/library/show-processlist/

There there is always a possibility that the mdl_logstore_standard_log  table is huge ... that table is written to on every login .... also assuming using files for sessions rather than DB ....

So there's still some stuff to explore! sad

'spirit of sharing', Ken

In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Thank you Ken, you are very nice and helpful.

I will try mysqltuner later today and see.

I tried grade_cron_task but it completes almost immediately - see attached.

Sessions : the defaut is used, ie infos not recorded in DB (guess files are used instead).

In DB, grades history and logstore are actually huge I think:

mdl_grade_grades_history 2,9GioAfficher -
mdl_logstore_standard_log
2,1Gio
 Afficher 
mdl_grade_grades
1,7Gio
 Afficher 

Should I try emptying these 3 tables ? is it secure ?


Many thanks again,

-michel



Attachment Capture 4.JPG
In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Ken Task -
Picture of Particularly helpful Moodlers

Be careful ... best not to manually truncate tables unless one knows what one is doing.

There are settings in moodle that would reduce the size of those large tables, but I'd say with 35K users none of those tables look too large.

Install and run .... mysqltuner.

Now you've said you had plenty of memory .... on this single server you have.

How about some true specs?

How much memory? (free - that's a command)

Is the server using SWAP space? (see above command)

How are you running apache?

How much memory is used by Apache and how much memory + swap space is used by the DB server and the DB's for Moodle?  (mysqltuner - top).

'spirit of sharing', Ken


In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Hi Ken, see attached the tuner's log. It does not tell me much ...

I've also attached a "top" display, with some data you asked for.

Here is the free:

              total        used        free      shared  buff/cache   available

Mem:       24112616      522168     3484132      302068    20106316    22891336

Swap:             0           0           0

That new server is : Debian,  4 vCore(s) @2,4 GHz, 24 GB RAM,    600 GB RAID disk

BTW: I did another test some time ago : instead of dumping the DB from the old server, I re-started from scratch (but same code, Apache, Php and MariaDB setups) on the new server by restoring 1 by 1 all 30 courses coming from the old server - and that way everything is OK, but the restore took about 5 days and nights, especially due to courses that have 20+ K enroled users (with their progress data etc...). So not sure the hardware sizing is the issue.

Regarding the old server : is running ok with these users and that DB from which I took the dump ... 3 times, with same problem each time.

Thank you,

-michel


In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Ken Task -
Picture of Particularly helpful Moodlers

Hmmmm ... really?  Nothing stands out or useful info from tuner?

[OK] Maximum reached memory usage: 342.1M (1.45% of installed RAM)
[OK] Maximum possible memory usage: 752.7M (3.20% of installed RAM)

Server not using much memory for DB is it - could use more!!!

But ... A Biggy:

[!!] InnoDB buffer pool / data size: 128.0M/40.8G

Your DB has 40.8 Gig of data, but server buffer pools configured to use only 128Megs.

That's got to translate as DB slow - which then translates to site slow .. in certain areas.

Pay attention to the !! in this section:

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 128.0M/40.8G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (75 %): 48.0M * 2/128.0M should be equal 25%
[!!] InnoDB buffer pool <= 1G and Innodb_buffer_pool_instances(!=1).
[--] InnoDB Buffer Pool Chunk Size not used or defined in your version
[OK] InnoDB Read buffer efficiency: 94.19% (78517692 hits/ 83359832 total)
[!!] InnoDB Write Log efficiency: 83.06% (51791 hits/ 62350 total)
[OK] InnoDB log waits: 0.00% (0 waits / 10559 writes)


Be sure to read and understand the recommendations Tuner makes at the end:


General recommendations:
 **   MySQL was started within the last 24 hours - recommendations may be inaccurate
 **   Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
    Adjust your join queries to always utilize indexes
    Increase table_open_cache gradually to avoid file descriptor limits
    Read this before increasing table_open_cache over 64: http://bit.ly/1mi7c4C
    This is MyISAM only table_cache scalability problem, InnoDB not affected.
    See more details here: https://bugs.mysql.com/bug.php?id=49177
    This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
    Beware that open_files_limit (16364) variable
    should be greater than table_open_cache (2000)
    Performance schema should be activated for better diagnostics
    Consider installing Sys schema from https://github.com/mysql/mysql-sys
**    Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: http://bit.ly/2wgkDvS

Variables to adjust:
    query_cache_size (=0)
    query_cache_type (=0)
    query_cache_size (> 16M)
    join_buffer_size (> 256.0K, or always use indexes with JOINs)
    table_open_cache (> 2000)
    performance_schema = ON enable PFS
**    innodb_buffer_pool_size (>= 40.8G) if possible.
**    innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
**    innodb_buffer_pool_instances (=1)

Is this from production server or clone?  Clone won't  tell you much as far as traffic.

It's ok to run tuner on production as well.    Just don't do any of the recommended tweaks without some research.

Important on production server:

[OK] Highest usage of available connections: 3% (5/151)
[OK] Aborted connections: 0.77%  (5/648)

The first, on production server, might show a number higher than 151 - has to do with concurrent connections to the DB.  Server your size/usage might have more than 150 users on it at one time - which then makes the second .. Aborted connections ... an indication of user problems.

The top command ... one is interested in the 'top' of 'top' for memory usage:- the following from an 8Git Memory box.

top - 15:37:28 up 32 days, 17:19,  1 user,  load average: 0.00, 0.01, 0.05
Tasks: 145 total,   2 running, 143 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8004216 total,   366352 free,  1468188 used,  6169676 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  5722852 avail Mem

Notice it isn't using SWAP space ... example above NOT a true production server in use all the time but a sandbox.  Your production server was using some SWAP ... hard to make out how much with that you shared.

The rest of the display of top shows realtime ... what process kicks in and goes to the top of all the processes which means it's using the most memory.

The one that says mysqld is your DB server.

Are you using Solr Search?   Then one will see Java ... another biggy memory wise ... how about document conversions?

Now a caution ... while DB servers run best when it can load the DB into memory - 80% - one might be able to attain that on a dedicated DB server ... not one that is also the web service.  Memory tweaking for an all in one box becomes a balancing act  between Apache web service and MySQLD.

Bottom line on an all in one ... no can give you values here in forums.  Get the DB to use as much memory as it can AND yet have enough to run other services server needs to run + (and a biggy) enough Apache clients to handle connections by students.

Clear as mud, huh? :|

'spirit of sharing', Ken



In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Thank you Ken, for the full analysis, I've learnt a lot about DB thanks to you.

Yes the new server is not a production but a pre-prod ... until we are sure that users can log in smoothly ...

As a first step I've changed the pool from 128M to 4G (and restarted mariadb), and we'll see how it goes - that's 30 times the previous/defaut value now.

Concerning std log and grades history, I'm trying to reduce them by first just changing values in moodle (from infinity to 6 months). I guess they are too big and this produces errors like when running tasks manually (see attached that Error Writing to DB). I hope that with the larger pool that error will disappear ...

Thanks again,

-michel

Attachment trait carnet.png
In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Ken Task -
Picture of Particularly helpful Moodlers

Strongly suggest running tuner on the production server - during non-prime time usage of Moodle.

Running on production server, you will see the same thing about buffer pools/data, but the one thing you can use for sure is the information from:

[OK] Highest usage of available connections: 3% (5/151)
[OK] Aborted connections: 0.77%  (5/648)

151 is the max number of connections to the DB server ... that is standard default for a fresh install of a DB server - MySQL or MariaDB.   It is one of the first items that needs to be tweaked.

Since the above metric was on your pre-production server, only 5 connections ... makes sense ... those are your testing folks.

Aborted connections .... there were 648 connections and of those 5 aborted ... logs in Apache will say "Server has gone away" - that's the MySQL (MariaDB) server that went away ... ie, couldn't take any more connections.

Running tuner on production server now will give you the current setting of 'max_connections'.

Then you will know what that setting should be in your pre-production server.

Now to the error ... mind translating the French for me - can't highlight text in an image to put into Google Translator.   Task failed.   See it's about grades ... cache on notes?

I don't have access to a 3.5 that's using grades, but in the sandbox 3.5 DB this:

mysql> show tables like '%grades%';
+----------------------------------+
| Tables_in_moodle35ssl (%grades%) |
+----------------------------------+
| mdl_assign_grades                |
| mdl_grade_grades                 |
| mdl_grade_grades_history         |
| mdl_lesson_grades                |
| mdl_quiz_grades                  |
| mdl_quiz_overview_regrades       |
| mdl_workshop_grades              |

In other words, my system doesn't have a couple of those tables you see ...

The clues .... notes? ... and the tables ...  so think I'd do an 'explain' on those tables to see what columns there are - notes? ... and then a query or two on that table to see what data is contained there in.

How often is your cron job running?   Both servers ... production as well as pre-production?

'spirit of sharing', Ken


In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Hi Ken,

I did now run the tuner on the old production server (M3.1), the result is attached. As soon as the login hanging does not happen on the pre-production server (M3.5), I'll copy (dump) the current live DB into it and give the go-ahead turning the pre-production into production.

After :
- putting 180 days instead of infinity for std log and grades history (instead of infinity on old server, which was running for about 1.5 year) 
- setting innodb_buffer_pool_size=4G (instead of 128M)
- running a few times the cron on the web,
now the manual execution of the scheduled task for cleaning grades is completing normally !!
Yes, "notes" means grades in french and the task is about cleaning the gradebook, "tache de fond" is background task, it was failing with error when writing to the DB ... but now that's completing normally.

The cron is running every minute on the new M3.5 pre-prod server, and every 5 minutes on the old M3.1 prod server.

When I have time I'll exercize the login back and forth to see if login hanging has also disappeared or not, hopefully yes. Will let you know.

Thanks again Ken,

-michel

In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

The login hanging might have disappeared as I cannot reproduce it after a few minutes quick login/logout using multiple tabs in parallel ! I cross my fingers ...

Many thanks again Ken !

-michel

In reply to Michel DENIS

Re: from M3.1 to M3.5 : database issue ?

by Ken Task -
Picture of Particularly helpful Moodlers

You can see differences ... from production ... one being used ... and staging ... not yet announced.

[!!] InnoDB buffer pool / data size: 128.0M/5.5G

You can see the db on prod is even larger than staging ... but you've reset things on staging to reduce the size of some of the largest tables.

Additional tweaking ... have read that for every 1 GIG on has for buffer pool, instances needs to equal:

Set to 128M, this one is ok for now

[OK] InnoDB buffer pool instances: 1

but increase to 6 gig on that server and instances should now be 6.

Also, from latest sharing of tuner run on production:

[OK] Highest usage of available connections: 11% (17/151)
[OK] Aborted connections: 0.00%  (4/1352505)

17?   Wow!  Would have expected that to be higher.   Bet that will increase.

Keep an eye on that metric.   When students have trouble logging on, it could mean 151 needs to be increased.

The aborted connections does show only 4 out of 1.3x million - so that's good.

Have noticed that there are burst ... like right before finals or something like that.

Bottom line .... start monitoring using tuner more often and one will have fewer issues.

'spirit of sharing', Ken


Average of ratings: Useful (1)
In reply to Ken Task

Re: from M3.1 to M3.5 : database issue ?

by Michel DENIS -

Yes Ken.

I guess that reducing largest tables together with increasing the pool solved the login hanging problem.

Again many thanks to you Ken,

-michel