2011-08-06

How to Move a MongoDB Collection Between Databases

If you need to move a collection between two MongoDB databases, there is no need to dump and restore your data using mongodump / mongorestore.

In a way, it's pretty similar to what we can do with  MySQL's "RENAME TABLE" command:
db.runCommand({renameCollection:"sourcedb.mycol",to:"targetdb.mycol"})
In the background, MongoDB will dump and restore it automatically. There is no metadata "magic" for such a rename since databases reside in different files on disk. It just saves a bit of work, and it's worth knowing about.

How to Fix Erroneously Named MongoDB Collections

In MongoDB, collections are not supposed to include certain characters like, for example, a hyphen "-" or a slash "/". The manual has this to say on the matter:
Collection names should begin with letters or an underscore and may include numbers; $ is reserved. Collections can be organized in namespaces; these are named groups of collections defined using a dot notation.
The keyword here is "should".
It is possible to create collations with names that do not adhere to these rules. This is not enforced at all. When you try to access such a collection from the shell, things don't quite work as expected:
>db.test-col.find()
Sat Aug  6 14:54:06 ReferenceError: col is not defined (shell):1
This is because the javascript console tries to subtract "col" from "test". I've tried different escaping methods for the rogue hyphen, but couldn't find one that works.
Eventually, I've reverted to using the more "internal" renameCollection command, to rename the collection into something manageable:
>db.runCommand({renameCollection:"test.test-col",to:"test.test_col"})
Which solves the problem completely.

Now you might be wondering: how was this created in the first place? Because if you can't read from this collection, you can't "save" into it from the shell as well.
We have a nice list of commands here which I recommend to anyone who's working with MongoDB to be familiar with. One of these commands is the aptly named "create", which will obediently create a collection with such a name:
>db.runCommand({"create":"test-col"});
{ "ok" : 1 }
I can only assume drivers use this same command to create the collections, which was how it was created in the first place.

2011-01-23

More Optimizer Bugs in MySQL

I'm proud to report of two bugs I've submitted to the MySQL bug tracker.
Both are seemingly simple and vaguely related, but have grave consequences when combined with large amounts of data.

The first one, Bug #50212 is a regression performance problem in 5.1, dealing with MySQL not using a primary key when it can, in order to avoid a file sort. The problem is made worse by the fact that even using a FORCE INDEX hint does not help things. The bug requires a join condition to be present to reproduce.
So if you have a large table, and you're expecting MySQL to select data from it ordered by the primary key and avoid a file sort, you're out of luck.
Of course, you can remove the ORDER BY and hope it's going to return in order, but it's not guaranteed.
There is no workaround, and this bug is in a verified state for a year now.

A more recent bug, Bug #59557 deals with another optimization issue, that takes place when MySQL is not willing to perform an index scan to complete a GROUP BY statement, when you reverse the order of the fields compared the their order in the index.

Since a GROUP BY statement does not care about the order of fields, it should not matter: the fields in the GROUP BY clause are essentially a set, and not a ordered list, as they are in ORDER BY.

An example to demonstrate the problem (taken from the bug report itself):
For this table:

CREATE TABLE `t` (
`a` INT(11) NOT NULL,
`b` INT(11) NOT NULL,
PRIMARY KEY (`a`,`b`)
) ENGINE=INNODB;
Run this:
mysql> EXPLAIN SELECT COUNT(*) FROM t GROUP BY b,a\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
type: index
possible_keys: NULL
key: PRIMARY
key_len: 8
ref: NULL
rows: 1
Extra: Using index; Using temporary; Using filesort

What you get is: Using index; Using temporary; Using filesort
If you're a MySQL veteran, you know that any GROUP BY is implicitly a ORDER BY as well - and you'd guess that causes the issue.
Unfortunately, even if we add a ORDER BY NULL clause, the optimizer would still choose a wrong plan, like so:
mysql> EXPLAIN SELECT COUNT(*) FROM t GROUP BY b,a ORDER BY NULL\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
type: index
possible_keys: NULL
key: PRIMARY
key_len: 8
ref: NULL
rows: 1
Extra: Using index; Using temporary
What you get now is: Using index; Using temporary
A bit less wrong, but still... wrong. If someone knows what the database tried to accomplish by using a temporary table here, I'd be happy to hear an explanation... :)

As I've said, I am proud to submit bugs and help the community, but I would be a bit more proud if bugs I've submitted a year ago would have been fixed by now. Just saying.

2010-11-18

Which SQL Mode Should I Use with MySQL?

Dealing with sql_mode can be a tricky business, and many people have blogged about it before (here and here). My intent is not to explain the details of what each option means (since it's covered in the manual), but rather to answer this simple and common question:
Which SQL mode should I use with MySQL? / What is the recommended SQL mode for MySQL?

My general purpose recommendation is this one:
sql-mode="TRADITIONAL,NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY"
This "traditional" setting includes most of what you'd want your database to behave like, namely:
STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE,NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER
This has a nice benefit of returning an error instead of a warning on truncating a varchar column, not let you inserts completely broken date values, and give out division by zero errors. That's pretty traditional as I see it.

The other two settings make sure you don't create InnoDB tables as MyISAM by mistake, and that you include all the columns in the GROUP BY clause, like in other databases.

That's my choice for a reasonable all-around setting - for different applications you might need different ones (see excellent comment by Roland Bouman below).

2010-11-09

How to Install innotop 1.8.0 on CentOS 5

innotop is a fine tool that every MySQL DBA should be familiar with. Although it takes a while to get used to, it's worth it. It's less than intuitive for most people, unless you grew up in Linux.
The other problematic part except the learning curve is the installation. It took me a while last time I had to install it, and this time re-installing it I've decided to write down the steps needed.

Most of the trouble I went through the first time around were with CPAN, and finding out that those Perl modules were available in RPM form saved me lots of time. Avoid CPAN if you can for this purpose.
Also, these steps should work on RedHat Linux as well, of course, but I didn't test it.
  1. Download the compatibility version of the shared MySQL libraries. The compatibility package will save you a lot of trouble due to dependencies. The rest won't go smoothly if you don't get this one.
    It should be named something like:
    MySQL-shared-compat-5.1.xx.distribution.platform.rpm
  2. Install it:
    rpm -Uvh MySQL-shared-compat-5.1.52.rhel5.x86_64.rpm
  3. Install these packages:
    yum install perl-DBI perl-DBD-mysql perl-TermReadKey
  4. Download latest Innotop, and extract it:
    wget http://innotop.googlecode.com/files/innotop-1.8.0.tar.gz
    tar zfx innotop-1.8.0.tar.gz
  5. We're done with the prerequisites. Go into the directory and install innotop (from now on it's the same as in the INSTALL file):
    perl Makefile.pl
    make install
That's it. We're done.
Hope this saves someone some time.

Update:
On Debian, this is also simple due to this little trick (if you have the right sources defined):
sudo aptitude install libdbd-mysql-perl

2010-07-27

Recommended Google Chrome Extensions

Not all great Chrome Extensions get to the Featured or Popular lists on the Google Chrome Extensions website. These are 3 extensions I really like, and are not as widely known.

2010-07-26

UNHEX for Integers in MySQL

The following is a very, very specific subject matter, but I wish this blog post had existed when I was looking for the solution to a problem I was having. Most likely it has no interest to anyone who is not specifically looking for it, so feel free to stop reading...

MySQL has a few built-in functions for handling binary data. One of them is HEX which converts any data into hexadecimal representation. The function which you would expect to do the opposite (as the manual states) is UNHEX which takes a hexadecimal representation and turns it into characters.

So, if you try to do:
SELECT UNHEX(HEX('a'));
You get an "a" back. But if you try that on a integer:
SELECT UNHEX(HEX(1));
You get back the char corresponding to the ASCII value of 1, which is not what was intended.
The correct way to do it, and the real opposite of HEX is the CONV function which converts betweens bases:
SELECT CONV(HEX(1), 16, 10);
This time the result is the number "1" as expected.