2008-11-03

Emulating MySQL’s GROUP_CONCAT() Function in SQL Server 2005

Sometimes, the small things make all the difference. Last month, a friend of mine that works at Intel asked me how to do something seeming easy – concatenate the rows eliminated during a GROUP BY and present them in a column of their own. This is actually something typical when trying to do all sorts of reports, and just makes thing easier for everyone.
The answer? GROUP_CONCAT of course. This could have been easy. Alas, they were working with SQL Server 2005, which does not have such a function.
Now, all of the above isn’t exactly news – there are tons of articles and blog posts about it on the web. Some give you the code for special UDFs and some give you solutions that use cursors. A 3 year old post by Xaprb suggests using local variables.
I suggest a different approach based on a “hack” which utilizes the new xml functions in SQL Server 2005 to concatenate the values in a column.
It goes like this:
SELECT my_column AS [text()]
FROM   my_table
FOR XML PATH('')
This gives you the concatenated values of all of the values of this column in the table. From here on, it’s all just simple SQL. I have made two complete examples, in the order I’ve tried them. I’ve used the information_schema tables, since they exists everywhere.
First intuitive version:
SELECT table_name,
       LEFT(column_names,LEN(column_names) - 1)   AS column_names
FROM   (SELECT table_name,
               (SELECT column_name + ',' AS [text()]
                FROM   information_schema.columns AS internal
                WHERE  internal.table_name = table_names.table_name
                FOR xml PATH ('')
               ) AS column_names
        FROM   (SELECT   table_name
                FROM     information_schema.columns
                GROUP BY table_name) AS table_names) AS pre_trimmed;
Second version (admittedly inspired by this post, which I stumbled on after writing the first version):
SELECT table_name,
       LEFT(column_names,LEN(column_names) - 1)   AS column_names
FROM   information_schema.columns AS extern
       CROSS APPLY (SELECT column_name + ','
                    FROM   information_schema.columns AS intern
                    WHERE  extern.table_name = intern.table_name
                    FOR XML PATH('')
                   ) pre_trimmed (column_names)
GROUP
 BY table_name,column_names;
The second CROSS APPLY version is supposedly slower according to the execution plan, but I didn’t conduct any benchmarks at all.

I hope this saves someone several hours of work :)