find the root of this panic error

find the root of this panic error

by Danny Wahl -
Number of replies: 3

Alright, here's the backstory.  We have a mahoodle instance running 2.4+ (Build: 20121208) (weekly) pulled from git. and Mahara similarly.  The Apache/APC instance sits on an Ubuntu 12.04 instance and the PostgreSQL 9.1 instance sits on another Ubuntu 12.04 instance.  Moodle and Mahara are configured to talk to the database based on its hostname (fqdn), and the database only listens from the web server.

Up until yesterday, this has all worked great.  yesterday at 2:30 PM our core firewall died.  We have no rules setup for traffic from Moodle to be filtered through the firewall except for external traffic to pass in/out of the web server.  Otherwise, no loopbacks, etc...

I have no reports of Moodle NOT working yesterday afternoon on campus (intranet) but I don't know that it DID either.  The team installed a new firewall and say they tested logging into Moodle with no error some time in the evening.

Today at the beginning of school I started receiving support tickets that Moodle wasn't working.  I assumed firewall, but there are no rules in the new firewall for it either.  AND Mahara still works perfectly (same web host, same db host - different database)

Between 2:30 yesterday and 8:30 this morning nothing changed in Moodle, it's config, it's plugins, or on the host or the db host. but here's the errors I'm getting: (oh yeah, moodle is set to httpslogin)

Debug info: PANIC: corrupted item pointer: offset = 18960, size = 16
SSL SYSCALL error: EOF detected
UPDATE mdl_user SET lastlogin = $1,currentlogin = $2,lastaccess = $3,lastip = $4 WHERE id=$5
[array (
'lastlogin' => '1355296646',
'currentlogin' => 1355454321,
'lastaccess' => 1355454321,
'lastip' => '10.75.10.1',
0 => '2',
)]
Error code: dmlwriteexception
Stack trace:

    line 429 of /lib/dml/moodle_database.php: dml_write_exception thrown
    line 243 of /lib/dml/pgsql_native_moodle_database.php: call to moodle_database->query_end()
    line 1014 of /lib/dml/pgsql_native_moodle_database.php: call to pgsql_native_moodle_database->query_end()
    line 1054 of /lib/dml/pgsql_native_moodle_database.php: call to pgsql_native_moodle_database->update_record_raw()
    line 3364 of /lib/moodlelib.php: call to pgsql_native_moodle_database->update_record()
    line 4262 of /lib/moodlelib.php: call to update_user_login_times()
    line 178 of /login/index.php: call to complete_user_login()
    

Debug info: PANIC: corrupted item pointer: offset = 18960, size = 16
SSL SYSCALL error: EOF detected
SELECT * FROM mdl_context WHERE contextlevel = $1 AND instanceid = $2
[array (
0 => 50,
1 => '1',
)]
Error code: dmlreadexception
Stack trace:

    line 426 of /lib/dml/moodle_database.php: dml_read_exception thrown
    line 243 of /lib/dml/pgsql_native_moodle_database.php: call to moodle_database->query_end()
    line 753 of /lib/dml/pgsql_native_moodle_database.php: call to pgsql_native_moodle_database->query_end()
    line 1382 of /lib/dml/moodle_database.php: call to pgsql_native_moodle_database->get_records_sql()
    line 1354 of /lib/dml/moodle_database.php: call to moodle_database->get_record_sql()
    line 1333 of /lib/dml/moodle_database.php: call to moodle_database->get_record_select()
    line 6535 of /lib/accesslib.php: call to moodle_database->get_record()
    line 2328 of /lib/navigationlib.php: call to context_course::instance()
    line 1062 of /lib/navigationlib.php: call to global_navigation->add_course()
    line 2918 of /lib/navigationlib.php: call to global_navigation->initialise()
    line 766 of /lib/pagelib.php: call to navbar->has_items()
    line 27 of /media/data/moodlethemes/zebra/layout/header.php: call to moodle_page->has_navbar()
    line 25 of /media/data/moodlethemes/zebra/layout/general.php: call to require_once()
    line 804 of /lib/outputrenderers.php: call to include()
    line 734 of /lib/outputrenderers.php: call to core_renderer->render_page_layout()
    line 2362 of /lib/outputrenderers.php: call to core_renderer->header()
    line ? of unknownfile: call to core_renderer->fatal_error()
    line 1416 of /lib/setuplib.php: call to call_user_func_array()
    line 366 of /lib/setuplib.php: call to bootstrap_renderer->__call()
    line 366 of /lib/setuplib.php: call to bootstrap_renderer->fatal_error()
    line ? of unknownfile: call to default_exception_handler()

The above error appears when a user logs in.  However, stripping /login/login.php from the URL will generally get the user back to the homepage successfully logged in.  However, searching, refreshing, and visiting various activities randomly cause this error to occur.  Refreshing fixes it.

The server(s) are not under load, not out of CPU, RAM, or Disk space.  Here's a paris-traceroute output from web to sql based on hostname:

sudo paris-traceroute tissql.tis.com
traceroute [(192.168.0.188:33456) -> (192.168.0.189:33457)], protocol udp, algo hopbyhop, duration 0 s
 1  tissql.tis.com (192.168.0.189)  0.409 ms    0.326 ms    0.360 ms

You can see that it's clearly NOT being routed through the firewall.

Here's the log output from postgres about the error:

2012-12-14 12:32:25 CST LOG:  database system is ready to accept connections
2012-12-14 12:32:25 CST LOG:  autovacuum launcher started
2012-12-14 12:33:26 CST PANIC:  corrupted item pointer: offset = 18960, size = 162012-12-14 12:33:26 CST STATEMENT:  UPDATE mdl_user SET lastip = $1,lastaccess = $2 WHERE id=$3
2012-12-14 12:33:26 CST LOG:  server process (PID 2439) was terminated by signal 6: Aborted
2012-12-14 12:33:26 CST LOG:  terminating any other active server processes
2012-12-14 12:33:26 CST WARNING:  terminating connection because of crash of another server process
2012-12-14 12:33:26 CST DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2012-12-14 12:33:26 CST HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2012-12-14 12:33:26 CST LOG:  all server processes terminated; reinitializing
2012-12-14 12:33:26 CST LOG:  database system was interrupted; last known up at 2012-12-14 12:32:25 CST
2012-12-14 12:33:26 CST LOG:  database system was not properly shut down; automatic recovery in progress
2012-12-14 12:33:26 CST LOG:  redo starts at 6/878092F0
2012-12-14 12:33:26 CST LOG:  unexpected pageaddr 6/83930000 in log file 6, segment 135, offset 9633792
2012-12-14 12:33:26 CST LOG:  redo done at 6/8792E150
2012-12-14 12:33:26 CST LOG:  last completed transaction was at log time 2012-12-14 12:33:25.581158+08

That repeats for every attempt.

Click here for debug5 with everything logging

-----

What I'm really hoping to prove is that this is NOT Moodle.  But if it is-- then I'm assuming it's a core bug because as I said, nothing has changed on my setup (but that's what makes me think it's not Moodle either)

Average of ratings: -
In reply to Danny Wahl

Re: find the root of this panic error

by James Richardson -

Hello Danny!

Your Moodle may be fine. This may be related to a corrupt table in your database. I asked our systems support here at InMotion and they recommend running:

mysqlcheck -c --auto-repair <database_name>

This should repair any corrupted tables for MySQL. In the following part of the error:

The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Our systems support says that this Implies a trasactional DB such as InnoDB. You can try a repair for Postgre to see if that resolves it. Its possible when the firewall was set up a MySQL query currupted a table when a user was logging in. Its difficult to say. I hope this was helpful.

Regards,

James R

 

 

In reply to James Richardson

Re: find the root of this panic error

by Danny Wahl -

James, thank you very much for your reply.  I had eventually determined that an active transaction (or maybe a dozen) had been broken when the firewall went down.

It's good to hear that it might be repairable, and yes we are running PostgreSQL!

You just made my Saturday, and tell your systems guys "thanks" from us too.