use of SELECT * ... GROUP BY

use of SELECT * ... GROUP BY

by Penny Leach -
Number of replies: 5
Dear Moodle developers!

This is going to be a rant about mysql, please do not be offended! wink

I want everyone who writes SQL for moodle to consider the following table, 'foo':

 
+------+------+------+

| f1 | f2 | f3 |

+------+------+------+

| 1 | 2 | 3 |

| 1 | 4 | 5 |

| 1 | 6 | 7 |

+------+------+------+


Now, if we do

SELECT * FROM foo GROUP BY f1;

What do we expect to see returned? f1 is going to be 1, sure. What about in f2 and f3? Postgres doesn't allow this query; it says:

ERROR: column "f1" must appear in the GROUP BY clause or be used in an aggregate function

Which makes sense to me - what else would we expect to see in the other columns? f1 is distinct, but f2 and f3 have three results between them that need to fit in our 1 result from f1.

If we replaced f2 and f3 with aggregate functions that returned exactly 1 result, it would be fine. Example:

SELECT f1,count(f2),sum(f3) FROM foo GROUP BY f1;

it makes perfect sense.

Now, mysql happily allows the first query and doesn't complain about it, and this piqued my curiosity - what would it actually return? The answer:

 
+------+------+------+

| f1 | f2 | f3 |

+------+------+------+

| 1 | 2 | 3 |

+------+------+------+


Conclusion - mysql happily allows this query, even though it doesn't make sense, and not only does it not complain, it just randomly decides to return whatever the **** it wants! In this case, the first row of results for f2 and f3. Now, I'm not convinced that this is actually the result we want most of the time - surely it's better to complain bitterly rather than do it silently and return data we don't actually want?

Penny (viva la postgres! evil evil evil )
Average of ratings: -
In reply to Penny Leach

Re: use of SELECT * ... GROUP BY

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Well, I really only use this form when I know f2 and f3 are the same for all f1 anyway, or when I really only wanted f1 in the first place. Thus it's never been a problem and I haven't needed to put up with a database complaining at me. wink
In reply to Penny Leach

Re: use of SELECT * ... GROUP BY

by Gavin McCullagh -
For what it's worth, this is the MySQL manual explanation for this:

http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html

12.10.3. GROUP BY with Hidden Fields

MySQL extends the use of GROUP BY so that you can use columns or calculations in the SELECT list that don't appear in the GROUP BY clause. This stands for any possible value for this group. You can use this to get better performance by avoiding sorting and grouping on unnecessary items. For example, you don't need to group on customer.name in the following query:

mysql> SELECT order.custid, customer.name, MAX(payments)
-> FROM order,customer
-> WHERE order.custid = customer.custid
-> GROUP BY order.custid;

In standard SQL, you would have to add customer.name to the GROUP BY clause. In MySQL, the name is redundant if you don't run in ANSI mode.

Do not use this feature if the columns you omit from the GROUP BY part are not unique in the group! You get unpredictable results.


In reply to Penny Leach

Re: use of SELECT * ... GROUP BY

by John Papaioannou -
Heh... I remember making this mistake when refactoring the quiz overview report screen sometime ago. It was Penny I think who pointed it out at the time because Postgres wouldn't run it. wink

Now that I have learned to code fine GROUP BYs, should the credit go to Postgres or Penny? (I think Penny would be the social constructivist answer tongueout).