The strange get_recordset() conspiracy

The strange get_recordset() conspiracy

by Martín Langhoff -
Number of replies: 5

There's a new series of function calls in datalib that were introduced shortly after 1.5: get_recordset() and friends. MartinD mentioned today that it was not clear when and how to use them effectively -- in fact, that it was not clear whether do we want them. So I rolled up my sleeves, and did some testing and profiling. And though it takes a bit of work to make good use of them, they are definitely useful.

The function calls themselves take similar parameters to the get_records() family, and return an AdoDB recordset. In most situations, the right thing is to use get_records(), making sure that you are selecting a limited number of records (use LIMIT of course wink ), and that you are clearly fetching only the fields you need. That is enough for 99% of the cases. Stop reading now, use get_records(), and done.

Easy, huh? big grin

Now, there are some cases where we need to perform huge selects. And the downside of the get_records() machinery is that it allocates it all in RAM -- and if indeed records > than RAM, we are in trouble. This mostly affects batch processing in admin/cron.php (backups for instance) and the silly sync scripts we have in some auth and enrolment plugins.

In those cases, where you have no option but to SELECT all the users, or all the courses, get_recordset() really shines. As you get an AdoDB recordset, you can iterate through it very fast, and with very low memory usage. I have prepared and profiled some samples and found that to keep that advantage you have to be very careful how you use it.

I started testing with a large mdl_user table with 46K users, stored in PostgreSQL locally (socket connection). Here is a rundown of the code and rough timings...

The traditional get_records() is fast (~14s), but it consumes 296MB of memory. It looks like:

  $users = get_records('user');

So the next thing was to check that just triggering the SELECT was fast, and that fetching and re-shaping all that data into memory was an expensive step. get_recordset() in itself never fetches the data into PHP-space, so it takes only 4.8MB and 1.4s:

  $users = get_recordset('user');

Next I tried using get_recordset() with FetchNextObj() which looked quite convenient, because it will yield each record in an object, similar to the objects that get_records() returns inside the array. The memory usage stayed the same (4.8M), but this is very slow (~65s):

  $users = get_recordset('user');
  while($v = $users->FetchNextObj()){
      // do something 
  }

Internally, get_records() is using GetAssoc(), so I looked in there to se faster ways of getting at the data. Apparently, the trick is in reading the $rs->fields array and using MoveNext(). So this loop still takes 4.8MB and walks the 46K user records in 4s:

  $users = get_recordset('user');
  while(!$users->EOF) {
      $user = $users->fields;
      // do something
      $users->MoveNext();
  }

Naturally, as soon as you start doing useful things, this will take longer to exec. The core advantage, however, is that it can give you comparable performance to get_records() but with very low memory usage.

I ran these tests with the attached script, doing

  time php4 ./admin/selecttest.php get_records
  time php4 ./admin/selecttest.php get_recordset
  time php4 ./admin/selecttest.php get_recordset:fetchnextobj
  time php4 ./admin/selecttest.php get_recordset:movenext
Average of ratings: -
In reply to Martín Langhoff

Re: The strange get_recordset() conspiracy

by Gustav W Delius -
Very interesting. Martin, please put that in the developer docs at MoodleDocs.
In reply to Martín Langhoff

Re: The strange get_recordset() conspiracy

by Dan Stowell -
Martin, thanks for all this. I ran a less thorough test, prompted by MD, because he questioned why I was using get_recordset() when updating image-thumbnails in the database module. The conversation is in bug 5011. Since MD preferred get_records() and my test showed very little difference, we stuck with get_records().
In reply to Dan Stowell

Re: The strange get_recordset() conspiracy

by Martín Langhoff -
Yes, he mentioned that bug, which got me worried. And I did some initial tests that didn't look too good either, so I got even more worried that I had introduced these get_recordset() functions and not demonstrated clearly that they were valuable.

So I went away to figure out how to use them correctly. Given that it turned out that using them well isn't easy, I thought I'd post a bit of a walkthrough and demo code.

It's also given me some ideas to optimise get_records() -- I suspect that we can craft a cut-down version of GetAssoc() tailored to Moodle's use that should be faster, and possibly lower mem usage. Not now though -- 1.6 needs to be cut! -- but if you are keen on tinkering...