2011-01-23

More Optimizer Bugs in MySQL

I'm proud to report of two bugs I've submitted to the MySQL bug tracker.
Both are seemingly simple and vaguely related, but have grave consequences when combined with large amounts of data.

The first one, Bug #50212 is a regression performance problem in 5.1, dealing with MySQL not using a primary key when it can, in order to avoid a file sort. The problem is made worse by the fact that even using a FORCE INDEX hint does not help things. The bug requires a join condition to be present to reproduce.
So if you have a large table, and you're expecting MySQL to select data from it ordered by the primary key and avoid a file sort, you're out of luck.
Of course, you can remove the ORDER BY and hope it's going to return in order, but it's not guaranteed.
There is no workaround, and this bug is in a verified state for a year now.

A more recent bug, Bug #59557 deals with another optimization issue, that takes place when MySQL is not willing to perform an index scan to complete a GROUP BY statement, when you reverse the order of the fields compared the their order in the index.

Since a GROUP BY statement does not care about the order of fields, it should not matter: the fields in the GROUP BY clause are essentially a set, and not a ordered list, as they are in ORDER BY.

An example to demonstrate the problem (taken from the bug report itself):
For this table:

CREATE TABLE `t` (
`a` INT(11) NOT NULL,
`b` INT(11) NOT NULL,
PRIMARY KEY (`a`,`b`)
) ENGINE=INNODB;
Run this:
mysql> EXPLAIN SELECT COUNT(*) FROM t GROUP BY b,a\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
type: index
possible_keys: NULL
key: PRIMARY
key_len: 8
ref: NULL
rows: 1
Extra: Using index; Using temporary; Using filesort

What you get is: Using index; Using temporary; Using filesort
If you're a MySQL veteran, you know that any GROUP BY is implicitly a ORDER BY as well - and you'd guess that causes the issue.
Unfortunately, even if we add a ORDER BY NULL clause, the optimizer would still choose a wrong plan, like so:
mysql> EXPLAIN SELECT COUNT(*) FROM t GROUP BY b,a ORDER BY NULL\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
type: index
possible_keys: NULL
key: PRIMARY
key_len: 8
ref: NULL
rows: 1
Extra: Using index; Using temporary
What you get now is: Using index; Using temporary
A bit less wrong, but still... wrong. If someone knows what the database tried to accomplish by using a temporary table here, I'd be happy to hear an explanation... :)

As I've said, I am proud to submit bugs and help the community, but I would be a bit more proud if bugs I've submitted a year ago would have been fixed by now. Just saying.