So, during quiz backup/restore performance testing I discovered that large temporary backup tables in PostgreSQL weren't performing very well. 2 seconds per query rather than 20-30ms. See MDL-39725 and related issues for further details. This leaves us with a situation where we need to handle the statistics collection on temporary tables to ensure reasonable performance in those large temp table situations.
The analysis I've completed so far suggests PostgreSQL is the only database effected. However it seems a choice of many of the larger installation (including the one I work on). So I think something needs to be done.
Other databases would get a simple empty function would just return sucess. As at this point there is no need for these updates in anything except PostgreSQL.
Some discussion has happened on the tracker and there are two general ideas;
1. Update the statistics with specific calls at times that seem appropriate. eg, just after larger data loads or alterations. In backup that might be post loading the backup_ids_temp table.
2. Create infrastructure to automatically track when code changes have been made and call statistics updates when we consider the changes to be enough to warrant it.
3. Write some magic for postgresql so it can use its statistics to track if it should update the statistics for the temporary table.
I am in favour of (1). It might be the easiest to implement but I don't think that's critically relevant. As a developer you need to use temporary tables for specific purposes and carefully manage them for their short lived lives. So you should know when you are adding/changing lots of data. It can be said the developer should not need to worry about those things, but I'm yet to see that as a reality when we much of our time optimizing SQL to ensure it's fast on all platforms.
Eloy has expressed interest in (2), but I don't think he expressed the reasons for that on the tracker. Maybe he could elaborate here?
(3) was just a wild crazy idea to not reimplement PostgreSQL's statistics collection, just reuse it so it's less work to do something like (2). However you would still need to decide when to check those stats and call them.
I'm open to other ideas as well as voting and changes the the proposed ones.
Thanks
Russell