Databases: Proposed change to get_records_sql_menu()

Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Number of replies: 28
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Here is a suggested change to the function get_records_sql_menu($sql) from datalib.php. This suggestion is either a good idea, or very horrible, but I can't decide which.

Currently, if you pass this function a SELECT that produces a two-column result, it gives you back an associative array with entries col_1 => col_2.

I want to change this so that if your select produces a one-column result, you get an array that is basically a list of those values.

The change is around line 940 or datalib.php which currently looks like:
     if ( $rs->RecordCount() > 0 ) {
while (!$rs->EOF) {
$menu[$rs->fields[0]] = $rs->fields[1];
$rs->MoveNext();
}
return $menu;

} else {
return false;
}
and my proposal changes it to
     if ( $rs->RecordCount() > 0 ) {
if ($rs->NumCols() == 1) {
return $rs->GetArray();
} else {
return $rs->GetAssoc(false, true);
}
} else {
return false;
}
that is, an if statement for the special case of one column, and also changing to using the library routines, rather than manually doing a while loop.

Comments?

Thanks,

Tim.

P.S. the attached patch makes this change, and also updates all the necessary comments.

Average of ratings: -
In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Eloy Lafuente (stronk7) -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Peer reviewers Picture of Plugin developers Picture of Testers
Hi Tim,

IMO your suggested change has sense, saving us of some extra iterations. But I don't understand the objective of this part of your patch:
if ($rs->NumCols() == 1) {
return $rs->GetArray();
}
, mainly because it returns an "incompatible" structure (from the expected one). Wouldn't be better to return "false" if the number of columns is < 2?

Ciao smile
In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -

Hi Tim,

all the get_records* functions follow a convention -- not very well explained anywhere -- of returning a hash-like array (as opposed to a stack-like array). So the first column from the SELECT is expected to be unique.

Your patch is good, but it will break a standing convention... perhaps we could introduce a set of functions that deal with this better... something like get_values_sql() and get_values(). What do you think?

(You won't find me defending the get_records*() functions as they are, but it is a convention that a lot of code relies on. And Moodle hackers rely on that too.)

BTW, I've finally written a get_recordset_sql() and get_recordset() pair of functions. I'll land them in HEAD and STABLE soon. They return an ADODb recordset and you are free to do as you want with it. They are meant for those cases where there's no option but to get_recordset('users') and silly things like that wink

Actually wrote this in Canary Islands right after the MoodleMoot, but haven't had the time to implement it anywhere -- and I think it's worthless until such time that I change a few relevant parts of the code to use it...

In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by John Papaioannou -
I 've transferred some of my Moodle habits to another project I 'm involved with at the moment. Since I 'm using Smarty over there, and Smarty in 99% of cases does not like id => data arrays (it's simply very hard to code what would be simple for vector-like arrays without explicitly creating auxiliary data structures), I 've been using exactly what you describe as get_values_etc(). After a couple seconds of careful consideration, that family of functions has been named "get_bulk_etc()". wink

Speaking of new datalib functions, here's a couple that I think could be potentially very useful in Moodle (it already is in other places). Do you think I should land one or both?

function get_next_sequence($table, $sequencefield, $field = NULL, $value = NULL) {
    global $db, $CFG;
    $sql = 'SELECT MAX('.$sequencefield.') FROM '.$CFG->prefix.$table;
    if($field !== NULL && $value !== NULL) {
        $sql .= ' WHERE '.$field.' = '.$db->qstr($value);
    }
    $result = $db->GetOne($sql);
    return empty($result) ? 1 : $result + 1;
}

function get_count($table, $column, $field = NULL, $value = NULL) {
    global $db, $CFG;
    $sql = 'SELECT COUNT('.$column.') FROM '.$CFG->prefix.$table;
    if($field !== NULL && $value !== NULL) {
        $sql .= ' WHERE '.$field.' = '.$db->qstr($value);
    }
    return $db->GetOne($sql);
}



                                    
In reply to John Papaioannou

Re: Databases: Proposed change to get_records_sql_menu()

by Penny Leach -
I'm not sure I understand the first one.. we're using auto_increment (mysql) and serial primary key (postgres) anyway.

As far as the second one, doesn't that do the same thing as count_records (datalib) ?

In reply to John Papaioannou

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -

Good to hear that you're spreading the Moodle habits wink ... the Catalyst gang may or may not be guilty of lingering moodlisms in various projects out there...

I'm not sure of the name get_bulk_sql() -- it's not very descriptive (hey, get_values_sql() isn't either!), and it gives the wrong impression that it is geared to fetch data in large volumes. The only reasonably way to grab large sets of data (if you absolutely must) is get_recordset_sql(). Would like to avoid new hackers learning Moodle internals and assuming that a function meant for small/medium sized lists of values is for bulk data.

Hmmm. get_list_sql() perhaps?

WRT the functions you propose, we already have get_field_sql() which does count(), max(), min(), avg() and all the aggregates we could care about. I like it.

And get_next_sequence() isn't a bad idea, except that I suspect that it is a bad concept for Moodle. On MySQL it is a huge race condition if you ever use it to get the next id value before inserting a new record (is that what you want it for?). There is no safe way of doing this with MySQL, at least with ISAM/MyISAM tables. Perhaps the InnoDB backend uses sequences like everyone else? I dunno mixed

On PostgreSQL it is safe as we do have proper sequences on which you can get the next number and if a separate record is inserted in a table that was using that sequence, the number you got is skipped. It would be a it trickier to implement -- you have to guess the sequence name but on 7.4 onwards that's quite easy.

(Edit: Of course! Penny points out count_records() -- I wasn't sure we had one, so was grepping for function get_count and wondering whether it had been a figment of my imagination.)

(Edit II: A Google resultset showing good hits for sql insert max race condition )

In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by John Papaioannou -
+1 for get_list_etc(), as I mentioned get_bulk_ was the brainchild of a complete two seconds of thought. smile

count_records() I had missed, I confess. For some peculiar reason it seems that in Moodle only I want to count records less often than in anything else, otherwise its absence would 've been felt earlier...

As for get_next_sequence(): granted, the name is misleading, but I 'm actually not using it for sequences (correction: sometimes using it for sequences, but it's an intranet application with definitely "low traffic"). It's for situations like this:

A widget can have 0...N gadgets. We need to be able to refer to the ith gadget of a widget deterministically, so the gadget table has id, widgetid, gadgetnumber among others. When adding a new gadget there is no other way AFAIK that can be used to determine the newly added gadget's gadgetnumber.

Now that I think of it, it's the same situation exactly with the "weight" column in mdl_block_instance.


In reply to John Papaioannou

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -

When adding a new gadget there is no other way AFAIK that can be used to determine the newly added gadget's gadgetnumber.

PHP's MySQL driver has lastinsertid and Pg has a similar trick using sequences. In any case, insert_record() returns the inserted id. If it doesn't it's bug! wink

In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by John Papaioannou -
It's not about the gadget id, it's about the gadget number for a given widget. In mdl_block_instance, it's not about "what's the new instance's id" but "what weight should a new instance have to go at the bottom of a page with id X and type Y" (i.e., the max(weight) for all instances in page(X, Y) plus one). wink
In reply to John Papaioannou

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -
Ahhhhhhhhhhh! You had me worried for a second! wink

So if the race condition is triggered, and two inserts happen with the same weight, nothing bad actually happens (other than a little bit of indeterminism, but hey, it's web app, not an ICBM controller).
In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
I didn't explain why I wanted to do this. I wanted a library function that, given a query that returned a result with one column and many rows, returned that as an array.

I agree that does not match the normal convention as returning rows of results as an assoc array col_name => value.

However, Looking for the function I wanted in datalib, I found the (badly named) get_records_*_menu() family, which given a query that leats to a two-column result, give you an assoc array col1 => col 2. So there are already routines in datalib the break the convention.

So I could either write the library routines I wanted from scratch, which would be lots of code, or I could modify the existing methods in a weird way and get the same result, but that is confusing becuase then one method returns different sorts of results depending on how it is called.


Anyway, I think your get_recordset_*() routines are a much better approach, especially since I you then use the $rs->GetArray() and $rs->GetAssoc() routines if you really want the functionality I am after. How soon do you think you will be able to get these chagnes into datalib?

Tim.

In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -

How soon do you think you will be able to get these chagnes into datalib?

I've attached a patch (comments & fixes welcome). I'll be committing it to STABLE and HEAD as soon as I have a practical application for it (for instance, fixing a block of code that fetches too much data wink

Cheers!

In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
There is a lot of duplicated code in datalib which is just crying out for a refactoring. I think that without really trying, I have just saved 200 lines (out of about 3000).

The two types of common code I got rid of was
  1. Use the new get_recordset method in all the places a recordset was got with copy and pasted code.
  2. Move all the places where argments like $field1, $value1, $field2='', $value2='', $field3='', $value3='' were processed into a common method.
That is the attached patch, which includes Martins patch, so it should apply to HEAD does this. Note that this patch is untested, do not use yet! I just wanted to get this circulated before I left work for the eveing, so I could get your comments.

Of course, changing the code is easy. The difficult bit is being absolutely certain that nothing has broken. Given that, should I persue this? I hate duplicated code and I hate 3000 line source files, so I would really like to clean up some of the mess. If you think it is worth persuing, I'll produce a good patch, but I don't want to spend more time of this if no one is interested and it will become wasted effort.

A couple of questions:
  1. Do the count_records_*() methods actually do what the name suggests? if so how? The code looks broken to me.
  2. We could simmilarly refactor the comments. I don't think it helps to duplicate essentially the same comment on lots of different methods. Instead, we could pick a few methods and write really good detailed comments on those, and then on all the other methods just say @see other method for what these arguments mean. Is this a good or bad idea?
Tim.

In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Penny Leach -
I've just had a small look at your patch over my morning coffee and I think it's good.

Refactoring to remove duplication is good, I like it. I initially panicked when I saw the middle part of the patch that changes function names, but then I see that you've made the existing functions again further down.

Anything that changes function names is bad, as even if we audit all of moodle to change calls to them, a lot of people will have custom code that relies on them.

So as long as you're careful to not change any function names or calls, I'm happy.

Also as long as all the debug and PERF stuff is still in there, I'm happy.
In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -

I like it too -- though you'll have to test it a bit wink

(Datalib is one of those things that you just can't take for granted.)

diff got confused with the new get_recordset*() functions and the diff reads a bit funny. Oh, well. One thing to consider: in the new recordset_to_array(), there's an AdoDB function that returns something object-ish. And it's even in the AdoDB we ship with Moodle, methinks. Let me look... here it is: FetchObject($toupper=true). http://phplens.com/lens/adodb/docs-adodb.htm#fetchobject . If that works, we probably won't need recordset_to_array().

count_records() works, and the code even makes sense to me (why do you think it's broken?). The other count_records_*() look weird API-wise (I would use get_field_sql() instead) and I wonder if anything in Moodle uses them.

Refactoring comments to say "this is a thin wrapper around bla()" sounds reasonable to me. Still -- I do expect people to read the source and see that the action is happening elsewhere wink

In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
OK, then I'll do a proper version of this patch.

Actually, the right way to do this is proabaly as a sequence of patches that each eliminate pattern of common code. That would make it easier to see by inspection that each patch does not change any behavior. Of course, testing will also be important.

Tim.

In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
OK, here is a sequence of 6 good patches that do various cleanups to datalib. There is a readme that says what each patch does.

I have not had time to test these changes properly yet, but since they are cleaner patches, I was hoping that someone would have time to inspect them and see if they look OK. I'll try to do some testing next week, and I've checked them into our local repository, so my colleagues at the OU will be testing them a bit!

Tim.

In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -
Tim, thanks for the patch series. Reviewing them now, and I think I'll merge them into HEAD (and some things into STABLE) after testing them a bit.

BTW, what technique/tools are using to keep track of the different patches? If you have a vendor branch in CVS you can use cvsps to extract the commits as patchsets (which are easy for me to review & apply in an automated fashion, importing your commit msgs). If you are using something else, I'd be interested in figuring out a straightforward patch exchance mechanism we can use.

(Shameless plug: if you use cogito/git, it has great support for preparing series of patches for an upstream!) wink
In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Thanks for reviewing my patches.

Regrettably, I am not using any tools for managing the patches. I am just saving each patch before committing to our local CVS repository. Rod is looking into what it would take to get Eclipse to talk to a git repository (seemingly not too much), and then we can adopt better tools here.

Tim.

In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Warning, the last patch in the patch set I sent is broken. It breaks get_field, becuase it removes the line

gobal $db, $CFG;

It shoudl change it to

gobal $CFG;

instead.

Tim.


In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Two other errors: two of the get_recordset_* functions need the keyword 'return' adding at the start of the last line.

Doh! how stupid does that make me feel.

Anyway, with those fixes in, this modified version of datalib seems to work.

In debugging, I noticed a strange thing. Methods like "get_records" return each record as an object. Dumping those objects gives:

stdClass Object
        (
            [id] => 13
            [0] => online_users
            [name] => online_users
            [1] => 2004111600
            [version] => 2004111600
            [2] => 0
            [cron] => 0
            [3] => 0
            [lastcron] => 0
            [4] => 1
            [visible] => 1
            [5] => 0
            [multiple] => 0
        )

With two versions of each data item, one with a numeric key and one with the fieldname. Do we want or need this. (I don't think my patch changes this behaviour, it was like that before.)
In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -
> With two versions of each data item, one with a numeric key
> and one with the fieldname. Do we want or need this. (I don't
> think my patch changes this behaviour, it was like that before.)

AFAIK, it is a behaviour of PHP arrays. PHP's implementation of arrays mixes stack-like arrays and named arrays (known as "hashes" in Perl), so you can access the same array both ways.

Confusing? You bet! It's one of my least favourite PHP features. But I don't know how to get PHP to "typecast" a given array to be of one type only. If there's a way, I'm interested...

In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by John Papaioannou -
Not a behavior of PHP arrays, but of AdoDB (and it's configurable). See the doc for $ADODB_FETCH_MODE and SetFetchMode().

Obviously this is an unnecessary performance hit (how big? no idea, but testing a but with the PERF logging you guys introduced might tell), and "obviously" everyone would be using the fieldname keys in these arrays instead of the numerals. So we could remove them, but...

One caveat is that we may have external stuff in /lib/somelib/ (I have not checked) that use AdoDB and rely on $ADODB_FETCH_MODE having a certain value (or even change it themselves to ensure that they work correctly). This has happened to me in the past.

And of course, it is not inconceivable that we are using the numerals in Moodle code somewhere... this will be trickier to grep for.
In reply to John Papaioannou

Re: Databases: Proposed change to get_records_sql_menu()

by Martín Langhoff -

Whoa! I didn't know of that AdoDB magic. Yuck. It's definitely duplicating our memory footprint when we fetch large recordsets all together.

And of course, it is not inconceivable that we are using the numerals in Moodle code somewhere...

Bad news: we are -- just apply this little patch and watch things break big time... for instance, links to courses from the index page are missing the id parameter mixed

diff -u -r1.147 setup.php
--- lib/setup.php       13 Jan 2006 14:26:53 -0000      1.147
+++ lib/setup.php       16 Jan 2006 07:04:29 -0000
@@ -143,6 +143,13 @@
         die;
     }

+    /// Set the AdoDB fetchmode to assoc
+    // define('ADODB_FETCH_DEFAULT',0);
+    // define('ADODB_FETCH_NUM',1);
+    define('ADODB_FETCH_ASSOC',2);
+    // define('ADODB_FETCH_BOTH',3);
+    $db->SetFetchMode(ADODB_FETCH_ASSOC); 
+
     /// Set the client/server and connection to utf8 if necessary
         if ($dbconnected && $CFG->unicodedb) {
         if ($db->databaseType == 'mysql') {

Hmmm.

In reply to Martín Langhoff

Re: Databases: Proposed change to get_records_sql_menu()

by Samuli Karevaara -
I was also pondering on this last summer, but a bit later noticed the same thing: changing it breaks a lot of places. At that point I mentioned MySQL specifically, but apparently this happens (via ADODB) in PostgreSQL too?

Maybe now's the time to clean up those magic number references, where ever they might hide?

In reply to Samuli Karevaara

Re: Databases: Proposed change to get_records_sql_menu()

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
I would very much like to get this fixed, and clean up any code that is relying on those numbers. One copy is enough.
In reply to Tim Hunt

Re: Databases: Proposed change to get_records_sql_menu()

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Just to explain why things were that way: originally I was trying to reduce function calls there to help speed up the processing, because these functions get called a lot.  However, as things have grown over the years this is saving is probably outweighed by just loading that large library, so this sort of refactoring is welcome, thanks.

Like others have said, it's very important not to break this core API.  As long as these patches get tested well I'm happy to see them checked into 1.6 ASAP.