The problem is illustrated by this callgraph from the profiler, from the view when a single question is presented to a student.
-

1. I do note that the run time is doubled by a separate block module reinstantiating the activity. I know how to deal with that, but I am hoping for a better improvement than 50% ...
2. The second problem is that question_attempt::load_from_records is called once for each attempt (yes, 19966 is exactly twice the number of attempts) by question_usage_by_activity::load_from_records . This seems superfluous, as these attempt records are not used when a question is presented.
3. Most of the actual work is done inside STACK, so the superfluosity is not necessarily a problem for other question types. I must admit that STACK is used without pre-instantiated variants, so we do not benefit from the cache. However, monitoring with top(1) shows that the CPU is used by apache and not by maxima, so I am not convinced that this is a problem.
4. The reason why pre-instantiation is not used is that I need such a large number of variants (as the students get many variants each) that it is very tedious to set up, exported XML and backup files become huge. I simply have not found a practical and efficient way to do it.