So after all of that, it seems that the shutdown manager is working correctly. The first time I checked to see if rollback was running, the page did not a have a DB delegated_transaction, it only had a regular query, therefore the check was never going to work.
I added logging to the functions that create, commit and rollback a savepoint. Every time a savepoint was started, either a commit occurred properly, or the roll back was executed every time the request was shut down with the savepoint still there.
The issue with the PostgreSQL server (we are using 9.2.8) is that the memory used by the checkpointer process is constantly increasing (1 GB after 24 hours, 2.1 GB after 72 hours). We are worried that eventually the server will run out of memory and start swapping (and become useless). We don't know when/if the server will crash but this is a very important time for our customers to have uninterrupted access. The server crashed a couple of weeks ago due to running out of memory, and we have been able to reduce the memory dramatically by using connection pooling and limiting the max connections to 300.
When we used Moodle 2.1, the memory used by the checkpointer process never behaved like this (so we do not think that the pgsql version has a bug and instead focused on the changes to Moodle). We are also using memcached and opcache.
We saw in the code that Moodle 2.6 uses SAVEPOINTs and Moodle 2.1 does not, and our error log was full of messages every 5 minutes about DB transactions being aborted, therefore this is the area we focused on. It was suggested to us that our custom parts were the cause of this problem (assuming that Moodle would not have a problem causing large memory leaks) and therefore we thought that a redirection we made was causing a SAVEPOINT to be left unreleased.
We started the database logging and I asked to be sent the database logs and we will analyse them as soon as they arrive.
Also we are getting another error message (far less frequently): 2014-07-10 10:56:20: (mod_fastcgi.c.2676) FastCGI-stderr: PHP message: Cannot find session record hgugllcqdq5eq22frahjn69737 for user 90716, creating new session.
This error message was recorded 28 times at 2014-07-10 10:56:20. From my understanding this leads back to \core\session\manager::start(); in setup.php. We have updated the execution of the error logging function to include a backtrace, so we can get some idea of where this is actually happening.
It seems that neither of these 2 errors are causing a database memory leak, but the available memory is still disappearing.