There's a new series of function calls in datalib that were introduced shortly after 1.5: get_recordset()
and friends. MartinD mentioned today that it was not clear when and how to use them effectively -- in fact, that it was not clear whether do we want them. So I rolled up my sleeves, and did some testing and profiling. And though it takes a bit of work to make good use of them, they are definitely useful.
The function calls themselves take similar parameters to the get_records()
family, and return an AdoDB recordset. In most situations, the right thing is to use get_records()
, making sure that you are selecting a limited number of records (use LIMIT
of course ), and that you are clearly fetching only the fields you need. That is enough for 99% of the cases. Stop reading now, use get_records()
, and done.
Easy, huh?
Now, there are some cases where we need to perform huge selects. And the downside of the get_records()
machinery is that it allocates it all in RAM -- and if indeed records > than RAM, we are in trouble. This mostly affects batch processing in admin/cron.php
(backups for instance) and the silly sync scripts we have in some auth and enrolment plugins.
In those cases, where you have no option but to SELECT
all the users, or all the courses, get_recordset()
really shines. As you get an AdoDB recordset, you can iterate through it very fast, and with very low memory usage. I have prepared and profiled some samples and found that to keep that advantage you have to be very careful how you use it.
I started testing with a large mdl_user
table with 46K users, stored in PostgreSQL locally (socket connection). Here is a rundown of the code and rough timings...
The traditional get_records()
is fast (~14s), but it consumes 296MB of memory. It looks like:
$users = get_records('user');
So the next thing was to check that just triggering the SELECT was fast, and that fetching and re-shaping all that data into memory was an expensive step. get_recordset()
in itself never fetches the data into PHP-space, so it takes only 4.8MB and 1.4s:
$users = get_recordset('user');
Next I tried using get_recordset()
with FetchNextObj()
which looked quite convenient, because it will yield each record in an object, similar to the objects that get_records() returns inside the array. The memory usage stayed the same (4.8M), but this is very slow (~65s):
$users = get_recordset('user');
while($v = $users->FetchNextObj()){
// do something
}
Internally, get_records()
is using GetAssoc()
, so I looked in there to se faster ways of getting at the data. Apparently, the trick is in reading the $rs->fields
array and using MoveNext()
. So this loop still takes 4.8MB and walks the 46K user records in 4s:
$users = get_recordset('user');
while(!$users->EOF) {
$user = $users->fields;
// do something
$users->MoveNext();
}
Naturally, as soon as you start doing useful things, this will take longer to exec. The core advantage, however, is that it can give you comparable performance to get_records()
but with very low memory usage.
I ran these tests with the attached script, doing
time php4 ./admin/selecttest.php get_records
time php4 ./admin/selecttest.php get_recordset
time php4 ./admin/selecttest.php get_recordset:fetchnextobj
time php4 ./admin/selecttest.php get_recordset:movenext