2008-11-03

Emulating MySQL’s GROUP_CONCAT() Function in SQL Server 2005

Sometimes, the small things make all the difference. Last month, a friend of mine that works at Intel asked me how to do something seeming easy – concatenate the rows eliminated during a GROUP BY and present them in a column of their own. This is actually something typical when trying to do all sorts of reports, and just makes thing easier for everyone.
The answer? GROUP_CONCAT of course. This could have been easy. Alas, they were working with SQL Server 2005, which does not have such a function.
Now, all of the above isn’t exactly news – there are tons of articles and blog posts about it on the web. Some give you the code for special UDFs and some give you solutions that use cursors. A 3 year old post by Xaprb suggests using local variables.
I suggest a different approach based on a “hack” which utilizes the new xml functions in SQL Server 2005 to concatenate the values in a column.
It goes like this:

SELECT my_column AS [text()]
FROM   my_table
FOR XML PATH('')
This gives you the concatenated values of all of the values of this column in the table. From here on, it’s all just simple SQL. I have made two complete examples, in the order I’ve tried them. I’ve used the information_schema tables, since they exists everywhere.
First intuitive version:
SELECT table_name,
       LEFT(column_names,LEN(column_names) - 1)   AS column_names
FROM   (SELECT table_name,
               (SELECT column_name + ',' AS [text()]
                FROM   information_schema.columns AS internal
                WHERE  internal.table_name = table_names.table_name
                FOR xml PATH ('')
               ) AS column_names
        FROM   (SELECT   table_name
                FROM     information_schema.columns
                GROUP BY table_name) AS table_names) AS pre_trimmed;
Second version (admittedly inspired by this post, which I stumbled on after writing the first version):
SELECT table_name,
       LEFT(column_names,LEN(column_names) - 1)   AS column_names
FROM   information_schema.columns AS extern
       CROSS APPLY (SELECT column_name + ','
                    FROM   information_schema.columns AS intern
                    WHERE  extern.table_name = intern.table_name
                    FOR XML PATH('')
                   ) pre_trimmed (column_names)
GROUP
 BY table_name,column_names;
The second CROSS APPLY version is supposedly slower according to the execution plan, but I didn’t conduct any benchmarks at all.

I hope this saves someone several hours of work :)

2008-10-19

Problems Uninstalling MySQL Connector .NET

This is a short post that might save someone some valuable time, if Google decides to rank it high enough.

I've tried to install a newer version of the MySQL Connector .NET, namely 5.2.3 instead of the old 5.1.3 I had installed.
When trying to install 5.2.3, I got this error message:


Apparently the connector does not support upgrades from 5.1.x to 5.2.x. We should just remove the old one.

Here lies the problem: when I tried removing the old 5.1.3, I got a weird error of which I took no screenshot. It consisted of a blank error message showing a computer screen with a icon of a moon on it. Something resembling a "sleep mode" icon. Huh?


Anyway, to make a long story short, nothing I did to remove the 5.1.3 gracefully worked. I tried using the old .MSI file, tried reinstalling, changing the install configuration, editing registry keys... I also tried using msizap.exe. Nothing worked.

The solution was to use this tool from Microsoft - called "Windows Installer CleanUp Utility". It's supposed to be a wrapper for msizap, but it did the job, while msizap did not.

Regardless of MySQL, this is a good tool to be familiar with. It removed 3 other entries in my "Programs and Features" list that refused to leave by themselves.

Update: I've had to solve this problem a few more times since posting. As it seems, this "Windows Installer CleanUp Utility" is basically a wrapper for msizap, so it probably used some other parameters that made it work. In addition, I've encountered a time where none of the tools worked, but a registry "Seek & Destroy" for "MySQL Connector" did the job. Go figure...

2008-07-25

Recommendation: PuTTY Connection Manager

Linux desktop users: this post is probably not for you. You've got like a gazillion of these tools available.

Windows users: If you're like me, and use PuTTY all the time to manage your Linux MySQL servers, you'll appreciate this gem. It's called PuTTY Connection Manager and it, obviously, manages PuTTY connections. Not a very creative name, but a fine piece of software.

It's actively developed (latest alpha version out about two months ago), and boasts these features (taken from the developer website):

  • Tabs and dockable windows for PuTTY instances.
  • Fully compatible with PuTTY configuration (using registry).
  • Easily customizable to optimize workspace (fullscreen, minimze to tray, add/remove toolbar, etc...).
  • Automatic login feature regardless to protocol restrictions (user keyboard simulation).
  • Post-login commands (execute any shell command when logged).
  • Connection Manager : Manage a large number of connections with specific configuration (auto-login, specific PuTTY Session, post-command, etc...).
  • Quick connect toolbar to quickly launch a PuTTY connection.
  • Import/Export whole connections informations to XML format (generate your configuration automatically from another tool and import it, or export your configuration for backup purpose).
  • Encrypted configuration database option available to store connections informations safely (external library supporting AES algorithm used with key sizes of 128, 192 and 256 bits, please refer for the legal status of encryption software in your country).
  • Standalone executable, no setup required.
  • Completely free for commercial and personal use : PuTTY Connection Manager is freeware.
 Things that I personally liked about it:
  1. It's fast, since it's only a reasonably lightweight wrapper for the already fast PuTTY.
  2. It can save a portable connection database in a file, unlike PuTTY, which uses the registry for it's connections.
  3. It looks and feels like the Visual Studio window manager, so you can arrange and dock your PuTTY windows in all sorts of creative rectangular ways.
  4. It is relatively feature rich, but it doesn't get in your way, and just works most of the time.
  5. It's completely free.
Get it here.

Alternatives:

If you don't like it, there is another alternative that I personally like less for SSH-only work, which is called mRemote. It is much more advanced, has support for VNC and RDP, but I prefer the simpler and more flexible interface of PuTTY Connection Manager.

2008-06-17

How to Truncate All or Some of the Tables in a MySQL Database

Truncating all tables in a database is a common problem which arises a lot during testing or debugging.
One of the most common answers to this question is to drop & recreate the database, most likely utilizing the shell. For example, something like this:

mysqldump --add-drop-table --no-data [dbname] | mysql [dbname]

This dumps the entire schema structure to disk, without dumping any data, and with commands for dropping existing tables. When loading it back into mysql, it essentially truncates all the tables in the database. Basically, this is a decent solution for many uses.

We had a requirement for a solution that needed these additional features:

  1. Does not require shell (We work with both Linux and Windows)
  2. Resides inside the MySQL Server (To minimize outside dependencies - for example - the mysql command line client)
  3. Can truncate only specified tables using a some sort of filter

In other words, the main requirement here is encapsulation and simplicity of use for the developers.
Basically, this means I've got no other choice except writing a stored procedure for this purpose, ugly as it may be. Specifically, this will require using the pseudo-dynamic SQL in MySQL (more about this later) and a server-side cursor.

The code of the Stored Procedure:

DELIMITER $$
CREATE PROCEDURE TruncateTables()
BEGIN
DECLARE done BOOL DEFAULT FALSE;
DECLARE truncate_command VARCHAR(512);
DECLARE truncate_cur
CURSOR FOR /*This is the query which selects the tables we want to truncate*/
SELECT CONCAT('TRUNCATE TABLE ',table_name)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME LIKE 'prefix_%';
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = TRUE;

OPEN truncate_cur;

truncate_loop: LOOP
FETCH truncate_cur INTO truncate_command;
SET @truncate_command = truncate_command;

IF done THEN
CLOSE truncate_cur;
LEAVE truncate_loop;
END IF;

/*Main part - preparing and executing the statement*/
PREPARE truncate_command_stmt FROM @truncate_command;
EXECUTE truncate_command_stmt;

END LOOP;
END$$

DELIMITER ;

This is a generalized version, you can (and should) modify it to suit your needs.

Regarding dynamic SQL. Initially I've tried to make this query a lot simpler, and use something along the lines of the previous blog post. This would look something like:

DELIMITER $$
CREATE PROCEDURE TruncateTables()
BEGIN /*Build a long user variable containing the varchars*/
SELECT GROUP_CONCAT('TRUNCATE TABLE ',table_name SEPARATOR ';')
INTO @truncate_command
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME LIKE 'prefix_%';

SET @truncate_command = CONCAT(@truncate_command,';');

PREPARE truncate_command_stmt FROM @truncate_command;
EXECUTE truncate_command_stmt;

END$$

DELIMITER ;

The main (and unsolvable) problem with this code is that the PREPARE cannot prepare more than one statement. In other words, while in other databases you could have sent an entire batch (real dynamic SQL), in MySQL the only way to do this is to run statement by statement. On the other hand, there is no GROUP_CONCAT() in other databases (even though it can apparently be created with ease in PostgreSQL).

The second and less worrying problem is the fact that by default, GROUP_CONCAT() result tends to be limited to a certain length. This is easily solved by setting group_concat_max_len to something higher than default.

End Transmission.

2008-06-12

How To Perform a Connection Massacre

Occasionally, you find yourself in the need to kill problematic connections to the database.
Usually if it's only one or two connections, you can use the combination of "SHOW PROCESSLIST" command to identify the problematic connection ID, and run a "KILL ID" command.

What do you do if you need to kill 10 connections? Or 56? I wouldn't want to type in all those kill commands, it's just dirty work. What we need is a more neat manner to perform those kills. Mass kill, if you wish.

Alternative way: use the INFORMATION_SCHEMA's PROCESSLIST table, to construct the kill statements semi-automatically.

SELECT CONCAT('kill ',id,';') AS kill_list
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE command='Sleep';

This select will return something like this (when using the command line client):

+-----------+
| kill_list |
+-----------+
| kill 28; |
| kill 1; |
+-----------+
2 rows in set (0.04 sec)

You can also use GROUP_CONCAT() to get the kill commands in the same line, which may be more useful when you can't copy&paste easily:

SELECT GROUP_CONCAT('kill ',id SEPARATOR '; ') AS kill_list
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE command='Sleep';

Returns:

+------------------+
| kill_list |
+------------------+
| kill 28;,kill 1; |
+------------------+
1 row in set (0.03 sec)

Note, that you can use any filter in the WHERE clause, such as "WHERE db IN('dbname1','dbname2')", "WHERE user = 'appuser'" or "WHERE time>600". If you got any more clever uses, feel free to post them in the comments.

Now, all needed is to copy&paste those kill commands into the mysql command console (whichever you're using: GUI, command line, etc) and you're done.

Since most programs are kind enough to format the output in some way that prevents convenient copying/pasting, here's another tip. The mysql command line client can be asked to strip some of the output by using the "-s" optional parameter, which stands for "silent". You can use it once, and remove the ASCII art, timing and row count, and if you use it again (-s -s), you only get the actual rows.

An example doing just this (copy&paste friendly!):

mysql -s -s -e "SELECT CONCAT('kill ',id,';') AS kill_list FROM INFORMATION_SCHEMA.PROCESSLIST WHERE command='Sleep';"

Gets you this (with different numbers of course):

kill 28;
kill 1;

Another way to do this, is to use SQLyog or any other GUI for MySQL, which can output the results in text. Most of them have this as a configurable option (Ctrl-L keyboard shortcut in SQLyog).

Hope this is will save someone some precious time.

Update:
I didn't include a suggestion on doing this automatically, since I personally prefer to run kill commands manually most of the time. Plus, some of the comments gave me an idea. Why not just pipe the stripped kill_list into mysql?
mysql -s -s -e "SELECT GROUP_CONCAT('kill ',id SEPARATOR '; ') AS kill_list FROM INFORMATION_SCHEMA.PROCESSLIST WHERE command='Sleep';" | mysql

2008-05-31

Taking Cache Warm-up Time Into Consideration

Last week there there was a short scheduled downtime for the system, and I've used the chance to upgrade the MySQL servers to a newer version. We've tested it for more than a month before deploying to production, so I wasn't worried about any potential problems.

The upgrade itself went smoothly and was complete in about less than 10 minutes. Speaking of which, this is one of the things I like in MySQL upgrades as opposed to SQL Server.

  1. Copy new version
  2. Shut down the server
  3. Rename directories
  4. Start server
  5. Run mysql_upgrade
  6. Restart server

All of the above usually takes about 2 minutes, and can be reverted quite easily, if you have backup for the data files. This is compared to a SQL Server upgrade, or even a Service Pack install which includes running an installer for what can be 20 minutes or more. Uninstalling is equality lengthy, though both are accompanied by nice and useless progress bars.

After the upgrade was complete, we started the application and it wouldn't work - it was reporting timeouts. Why? Was there a problem with the upgrade? Some bug we haven't found out about? The circumstantial evidence pointed directly to the upgrade as the problem. After some digging by Pasha, the problematic query was found - it was a group by query updating some counter table. It ran in under 5 seconds before, but now it just wouldn't finish in the 30sec timeframe before timing out.

This immediately pointer to the real cause of the problem: the server restart. The buffer cache was flushed, and the index scanned by this query was not in memory. Fixing was easy, I ran the problematic query manually (it took almost 2 minutes to run). After this, the application worked again. Problem solved? Not really.

Queries of this sort should almost never exist in a system. This specific query was long since been noticed due to proactive slow-log monitoring. This problem was (ironically) fixed on the day of this incident, but the newer version was not yet deployed.

Conclusion: Think of how your application works with a cold database cache. If you need to warm it up, do so. A better path is to design the database more efficiently and avoid certain types of queries. In this case, the counter table was now updated on batches of inserts and deletes - instead of periodically.

2008-04-26

Using the MySQL Event Scheduler to Emulate Threads

The MySQL Event Schedule is one of the new features in MySQL 5.1. It is explained well in this article from the MySQL site, and in the manual itself. I'd like to demonstrate an interesting way to use it. What gave me this idea is actually a feature in SQL Server 2005, called Service Broker. It's a sort of queue manager / messaging platform, which can also be exploited in the same way.

Lets say you are writing a stored procedure, that loads data into the database. Benchmarks show that loading data in parallel from several threads increases overall throughput, as MySQL is unable to utilize several CPUs to process a single SELECT query, moreover a bulk load operation. To work in parallel, you'd probably need to write a script that uses threads, open several connections to the database, and perform those queries in parallel.

I propose an alternative.

The MySQL Event Scheduler is capable of performing several tasks at once, since it's highly probable that several events might need to run at the same time. Why not use it?
What we will create, is a "self-destructible" event, that performs a task once and disappears. The syntax that allows this is self explanatory:

CREATE EVENT IF NOT EXISTS event_name ON SCHEDULE AT NOW() ON COMPLETION NOT PRESERVE
DO statement_or_batch;

I don't think this is the place to descend into the implementation detail of using it for data loads, so I won't. Note, however, how flexible it is.

  1. You could use the event scheduler to perform several SELECT statements in parallel into some summary or union table, and collect the returned result afterwards.
  2. To see if an event is still running, you can poll the mysql.event system table for the event name, or names (when waiting for several events to complete).
  3. Combine it with the FEDERATED engine. That is one feature which is seriously in need of more development efforts. It is not even included in 5.1.24 due to severe problems. When it goes back to a stable status, you could use events to execute remote queries in your cluster in parallel - with a single command, from a single server.
  4. Use it for an unordered set of commands that your application shouldn't wait for at all, or it doesn't care when it gets done.
  5. Use it to delay execution, just replace NOW() with NOW() + INTERVAL 5 MINUTE, as demonstrated earlier.

The possibilities are endless. Well, not endless, but I'm sure that the event schedule can be tricked into doing a lot more cool things than these... If you can think of anything of that sort, I'd love to hear it.

Some points:

  1. As the mysql.events table is using the MyISAM engine, I'm not sure how transactional adding an event is. Most likely, it isn't.
  2. If you want to test the scheduler, make sure it's turned on (SET GLOBAL EVENT_SCHEDULER = ON).
  3. Each event run adds two lines to the error log (hostname.err) file reporting start and end of the event. This might be a problem similar to leaving the general log on, when using events extensively.