Databases: Non-Indexed JOIN in 1.9.6+

Databases: Non-Indexed JOIN in 1.9.6+

by Breno Jacinto -
Number of replies: 11

Hello,

I have tried the Hardware and Performance Forum  and the General Problem forums- but I believe this is the appropriate forum for my question.

I have upgraded from version 1.8.10+ to version 1.9.6+, and I'm getting an impressive amount of queries using JOINs that do not use indexes. Here's the mysqltuner output:


-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.81-community-log
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated -InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 930M (Tables: 218)
[OK] Total fragmented tables: 0

-------- Security Recommendations -------------------------------------------
ERROR 1142 (42000) at line 1: SELECT command denied to user 'moodle_mdluser'@'localhost' for table 'user'
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 1d 23h 29m 9s (5M q [30.350 qps], 286K conn, TX: 776M, RX: 1B)
[--] Reads / Writes: 74% / 26%
[--] Total buffers: 704.0M global + 20.4M per thread (500 max threads)
[!!] Allocating > 2GB RAM on 32-bit systems can cause system instability
[!!] Maximum possible memory usage: 10.7G (547% of installed RAM)
[OK] Slow queries: 0% (193/5M)
[OK] Highest usage of available connections: 6% (32/500)
[OK] Key buffer size / total MyISAM indexes: 512.0M/360.4M
[!!] Key buffer hit rate: 92.6% (102M cached / 7M reads)
[OK] Query cache efficiency: 68.1% (2M cached / 3M selects)
[!!] Query cache prunes per day: 2024
[OK] Sorts requiring temporary tables: 0% (11 temp sorts / 231K sorts)
[!!] Joins performed without indexes: 3701
[!!] Temporary tables created on disk: 35% (227K on disk / 639K total)
[OK] Thread cache hit rate: 99% (32 created / 286K connections)
[!!] Table cache hit rate: 4% (365 open / 7K opened)
[OK] Open file limit used: 20% (710/3K)
[OK] Table locks acquired immediately: 99% (2M immediate / 2M locks)

-------- Recommendations -----------------------------------------------------
General recommendations:
Adjust your join queries to always utilize indexes
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries without LIMIT clauses
Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
*** MySQL's maximum memory usage is dangerously high ***
*** Add RAM before increasing MySQL buffer variables ***
query_cache_size (> 64M)
join_buffer_size (> 16.0M, or always use indexes with joins)
tmp_table_size (> 32M)
max_heap_table_size (> 32M)
table_cache (> 1500)



At fist sight, that would happen because the indexes are not set correctly on the tables, right? But, when running the XML DB Editor for checking indexes, it scans the whole db and claims that all indexes are fine. Other checks are shown at http://moodle.org/mod/forum/discuss.php?d=137322

So, what can possibility be causing this? Is that the version o Moodle which still needs optimization to such queries?

In addition, everyday I get some fragmented tables. Everyday I'm running OPTIMIZE table to fix that.

If anyone on the forum the give me a hint on these issue - Thanks in advance.

regards,

- Breno

Average of ratings: -
In reply to Breno Jacinto

Re: Databases: Non-Indexed JOIN in 1.9.6+

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Err??? Does that tell is what the SQL is that is doing the non-indexed join? That is the key piece of information.
In reply to Tim Hunt

Re: Databases: Non-Indexed JOIN in 1.9.6+

by Breno Jacinto -
Hello Tim,


Thanks for replying. There're a number of queries, according to the mysql-slow log.
Some of them are:

# Time: 091110 13:05:04
# User@Host: moodle_mdluser[moodle_mdluser] @ localhost []
# Query_time: 9 Lock_time: 0 Rows_sent: 200 Rows_examined: 7093
SELECT u.id, u.firstname, u.lastname, u.picture, u.imagealt,
 s.id AS submissionid, s.grade, s.submissioncomment,
 s.timemodified, s.timemarked,
 COALESCE(SIGN(SIGN(s.timemarked) + SIGN(s.timemarked - s.timemodified)), 0) AS status FROM mdl_user u LEFT JOIN mdl_assignment_submissions s ON u.id = s.userid
 AND s.assignment = 942 WHERE u.id IN (2031,2032,2033,1651,2034,1511,1528,2035,1699,1933,1667,2024,1514,1611,1567,2025,1743,1686,1629,2026,1600,1759,2027,1688,1468,2029,2030,1707,1643,1597,1595,1685,1721,1692,1714,1745,1608,1732,1756,1572,1614,2028,1496,1478,1916,2064,1655,1750,1617,1760,1915,1508,1680,1536,1671,1575,1610,1502,1700,1615,1548,1690,1593,1633,1687,1623,1742,1491,1564,1620,1648,1582,1668,1672,8882,1085,1640,1730,65,1661,1920,1498,1675,1731,1918,1487,1660,1533,1606,27,1931,1696,1677,3011,1071,1544,1725,1739,1703,1557,1077,1913,1708,1928,1570,1470,1510,1591,1639,1663,1728,1653,1914,1751,1715,1518,1637,1477,1578,1919,1753,12,1634,5333,7567,1618,1616,1647,1678,1522,1650,301,1752,1521,1726,1585,1560,1681,1482,1554,1547,1927,1603,1930,1609,1727,1568,1757,1702,1740,1579,1758,1494,1576,1484,1642,1741,1577,1628,1698,1682,1553,1717,1479,1486,1912,1654,1613,1588,1592,1472,1542,1509,1649,1917,3047,1706,1676,1665,1733,1729,1749,1523,1695,1565,1551,52,1492,1594,1574,1598,1926,1532,1507,1641,1563,1701,1761,1683,1517,1990,1724,1488,1666,1527,8972,1607,1530,1659,1480,1929,1558,1583,1704,1932,1466,1463,1520,1559,1604,1513,18,1689,1746,1464,1748,1744,1539,1566,1656,1646,1505,1589,1684,1519,1922,1485,1630,1645,1662,1664,1716,1705,1723,1537,5353,1737,1709,1556,1638,1735,1546,1712,1632,1711,1586,1738,1923,1626,1720,1619,1669,1483,1624,1631,1713,1534,1602,1515,1506,1474,1535,1587,1652,75,1465,1481,1755,1621,1580,1601,1573,1612,1549,1697,1605,1512,1722,1734,1467,1635,1590,1718,1562,1473,1497,1504,1538,1501,1636,1657,1925) ORDER BY lastname ASC LIMIT 200;

# Time: 091110 0:30:11
# User@Host: moodle_mdluser[moodle_mdluser] @ localhost []
# Query_time: 119 Lock_time: 0 Rows_sent: 0 Rows_examined: 16189155
INSERT INTO mdl_stats_daily (stattype, timeend, courseid, roleid, stat1, stat2)

 SELECT 'activity', timeend, courseid, roleid, SUM(statsreads), SUM(statswrites)
 FROM (
 SELECT 1257822000 AS timeend, pl.courseid, pl.roleid, sud.statsreads, sud.statswrites
 FROM mdl_stats_user_daily sud,
 (SELECT DISTINCT ra.userid, ra.roleid, c.id AS courseid
 FROM mdl_role_assignments ra JOIN mdl_context ctx
 ON ctx.id = ra.contextid
 CROSS JOIN mdl_course c
 JOIN mdl_role_capabilities rc
 ON rc.roleid = ra.roleid LEFT OUTER JOIN mdl_course_categories cc1
 ON cc1.id = c.category
 WHERE c.id <> 1 AND
 ra.roleid <> 6 AND
 ra.userid <> 8925 AND
 ((rc.capability = 'moodle/course:view' OR rc.capability = 'moodle/site:doanything')
 AND rc.permission = 1 AND rc.contextid = 1
 AND (ctx.contextlevel = 10
 OR (c.id = ctx.instanceid AND ctx.contextlevel = 50) OR (cc1.id = ctx.instanceid AND ctx.contextlevel = 40)))
 ) pl
 WHERE sud.userid = pl.userid AND
 sud.courseid = pl.courseid AND
 sud.timeend = 1257822000 AND
 sud.stattype='activity'
 ) inline_view
 GROUP BY timeend, courseid, roleid
 HAVING SUM(statsreads) > 0 OR SUM(statswrites) > 0;
Another:

# Time: 091105 22:16:23
# User@Host: moodle_mdluser[moodle_mdluser] @ localhost []
# Query_time: 5 Lock_time: 0 Rows_sent: 520 Rows_examined: 15909
SELECT f.id
 FROM mdl_forum f
 LEFT JOIN mdl_forum_subscriptions fs ON (fs.forum = f.id AND fs.userid = 8203)
 WHERE f.forcesubscribe <> 3
 AND (f.forcesubscribe = 1 OR fs.id IS NOT NULL);

The second one last 119 seconds! There are many others at I have posted it at http://www.moodleufal.com.br/mysqlslow


regards,


- Breno
In reply to Breno Jacinto

Re: Non-Indexed JOINs in 1.9.6+

by Breno Jacinto -
Is anyone else experiencing this? Anyone with mysql-slow enabled that could post it in order to compare the results?

I'm thinking about filing a bug report on this, since it's been more than a week and no answer at all - either my question is very silly or no moodle user on the planet is having the same issue as I am.

regards,

- Breno
In reply to Breno Jacinto

Re: Non-Indexed JOINs in 1.9.6+

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Well, can you do an explain on that query and find out why it is taking so long?
In reply to Tim Hunt

Re: Non-Indexed JOINs in 1.9.6+

by Breno Jacinto -
Hello Tim,

Running an EXPLAIN on the second query shows this:

SQL result

Host: localhost
Database: moodle_dbbackup
Generation Time: Nov 17, 2009 at 03:13 AM
Generated by: phpMyAdmin 2.11.9.5 / MySQL 5.0.81-community-log
SQL query: EXPLAIN SELECT 'activity', timeend, courseid, roleid, SUM(statsreads), SUM(statswrites) FROM ( SELECT 1257822000 AS timeend, pl.courseid, pl.roleid, sud.statsreads, sud.statswrites FROM mdl_stats_user_daily sud, (SELECT DISTINCT ra.userid, ra.roleid, c.id AS courseid FROM mdl_role_assignments ra JOIN mdl_context ctx ON ctx.id = ra.contextid CROSS JOIN mdl_course c JOIN mdl_role_capabilities rc ON rc.roleid = ra.roleid LEFT OUTER JOIN mdl_course_categories cc1 ON cc1.id = c.category WHERE c.id <> 1 AND ra.roleid <> 6 AND ra.userid <> 8925 AND ((rc.capability = 'moodle/course:view' OR rc.capability = 'moodle/site:doanything') AND rc.permission = 1 AND rc.contextid = 1 AND (ctx.contextlevel = 10 OR (c.id = ctx.instanceid AND ctx.contextlevel = 50) OR (cc1.id = ctx.instanceid AND ctx.contextlevel = 40))) ) pl WHERE sud.userid = pl.userid AND sud.courseid = pl.courseid AND sud.timeend = 1257822000 AND sud.stattype='activity' ) inline_view GROUP BY timeend, courseid, roleid HAVING SUM(statsreads) > 0 OR SUM(statswrites) > 0;
Rows: 8

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1082 Using temporary; Using filesort
2 DERIVED <derived3> ALL NULL NULL NULL NULL 24234
2 DERIVED sud ref mdl_statuserdail_cou_ix,mdl_statuserdail_use_ix,mdl_statuserdail_tim_ix mdl_statuserdail_use_ix 8 pl.userid 56 Using where
3 DERIVED rc range mdl_rolecapa_rolconcap_uix,mdl_rolecapa_rol_ix,mdl_rolecapa_con_ix,mdl_rolecapa_cap_ix mdl_rolecapa_cap_ix 767 NULL 11 Using where; Using temporary
3 DERIVED c range PRIMARY PRIMARY 8 NULL 330 Using where
3 DERIVED cc1 eq_ref PRIMARY PRIMARY 8 moodle_dbbackup.c.category 1 Using index
3 DERIVED ra ref mdl_roleassi_conroluse_uix,mdl_roleassi_rol_ix,mdl_roleassi_con_ix,mdl_roleassi_use_ix mdl_roleassi_rol_ix 8 moodle_dbbackup.rc.roleid 1576 Using where
3 DERIVED ctx eq_ref PRIMARY,mdl_cont_conins_uix,mdl_cont_ins_ix PRIMARY 8 moodle_dbbackup.ra.contextid 1 Using where; Distinct


Well, I'm still puzzled by the amounts of JOINs in moodle 1.9.6 that are not using indexes. Is that normal or is it just me?

regards,

- Breno
In reply to Breno Jacinto

Re: Non-Indexed JOINs in 1.9.6+

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
This is the point where I have to admin I have never learned to read MySQL explains. I like the pretty diagrams in PgAdmin. Nor am I that familiar with the stats code.

This query is clearly doing something very complicated to do with statistics. Is it surprising, or particularly worrying, that it takes some time?

Anyway, to answer your question, Moodle should have all the indexes it needs to operate efficiently, and mostly it does. If you think a specific index is missing, file a bug in the tracker.
In reply to Breno Jacinto

Re: Databases: Non-Indexed JOIN in 1.9.6+

by M. N. -
Have you ran the 'Check Indexes' process included with Moodle?
It's at Site Administration Miscellaneous XMLDB editor. It will tell you if you have any missing indexes.

I also use mySQLTuner and it does report some joins without indexes but I haven't checked if they were from Moodle or not.
In reply to M. N.

Re: Databases: Non-Indexed JOIN in 1.9.6+

by Breno Jacinto -
Hello Tim, M.N.

I did this checking as soon as I found it existed. It reports that all my indexes are perfect - none is missing. Our migration was perfect - not a single error or warning. This is to be expected.

I'm beginning to think that those non-indexed JOINs are default in the Moodle actual SQL code - you just have to use buffers and increase RAM to handle it quickly. That's what I'm doing to alleviate our problem, but given the fact that we're using the MyISAM engine - we keep getting defragmented tables all the time, and if I dont repair and check the DB every day - data gets corrupted.

Well, thinking about it, our DB is relatively small - around 1.5 GB. Our number of users is relatively big - 5.000 for a double Pentium 4 core and 2GB RAM server. I believe we're just reaching limits of what is being used right now (specially at peak times).

My first option is to try InnoDB, but honestly, I'm much more inclined to go to PostgreSQL. My only question remains again to performance and capacity planning:

- Will PostgreSQL require more RAM to run?

- Will it perform as good as or better than MySQL? Is it tunable?

Is anyone aware of how to migrate safely from MySQL 5 to PostgreSQL 8.4?

If anyone can help me on this, I'd be very grateful.

best regards,

- Breno
In reply to Breno Jacinto

Re: Databases: Non-Indexed JOIN in 1.9.6+

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
I can't really answer your questions about requirements or migration; we do use Postgres with reasonable success on our large system, but we started off using it, so.

That said:

1) 'Number of users' or even 'simultaneous users' is not a good measure of load; I like using the number of log lines in a time period, which should be a decent approximation because Moodle adds a line to the log for most actions. (Although it is obviously biased a bit if you've added custom logging or whatever, or something in cron logs a lot of junk.) Last week our system added 3,494,243 lines in mdl_log. Peak load is more important than average load, so you might like to work out the peak of 'log lines per minute' or something.

If you want to find others with similar load to estimate required hardware, maybe this would be a useful way to identify them. (You could actually write that 'max log lines per minute, out of past week' query for people to run.)

I am not at all an expert on database performance but here's a couple other comments:

2) RAM: Our database server has 64GB, but our database is somewhat larger than 64GB, so doesn't entirely fit into that RAM. That sounds like a fairly similar proportion to yours (db mostly in memory) i.e it doesn't indicate to me that Postgres inherently requires more memory.

3) Performance: Intutively, I would expect Postgres to be slower (at least on write) than MySQL when MySQL is using the fundamentally broken corrupt-tables model; surely there has to be a cost associated with ensuring database integrity. At any rate I definitely don't think you should assume a performance improvement.

--sam






In reply to Breno Jacinto

Re: Databases: Non-Indexed JOIN in 1.9.6+

by Robert Brenstein -
> Our number of users is relatively big - 5.000 for a double Pentium 4 core and 2GB RAM server.

Have you thought of increasing RAM? 2GB for 5k active users does not seem like enough. Appache processes likely use a lion share of it. Double or tripple or quadrupole your RAM and see what happens. RAM is quite cheap and easy to upgrade smile
In reply to Robert Brenstein

Re: Databases: Non-Indexed JOIN in 1.9.6+

by Martín Langhoff -
+1. At least double the RAM. Note that past 4GB you'l want a PAE kernel or to migrate to a 64b OS.