2008-11-03

Emulating MySQL’s GROUP_CONCAT() Function in SQL Server 2005

Sometimes, the small things make all the difference. Last month, a friend of mine that works at Intel asked me how to do something seeming easy – concatenate the rows eliminated during a GROUP BY and present them in a column of their own. This is actually something typical when trying to do all sorts of reports, and just makes thing easier for everyone.
The answer? GROUP_CONCAT of course. This could have been easy. Alas, they were working with SQL Server 2005, which does not have such a function.
Now, all of the above isn’t exactly news – there are tons of articles and blog posts about it on the web. Some give you the code for special UDFs and some give you solutions that use cursors. A 3 year old post by Xaprb suggests using local variables.
I suggest a different approach based on a “hack” which utilizes the new xml functions in SQL Server 2005 to concatenate the values in a column.
It goes like this:
SELECT my_column AS [text()]
FROM   my_table
FOR XML PATH('')
This gives you the concatenated values of all of the values of this column in the table. From here on, it’s all just simple SQL. I have made two complete examples, in the order I’ve tried them. I’ve used the information_schema tables, since they exists everywhere.
First intuitive version:
SELECT table_name,
       LEFT(column_names,LEN(column_names) - 1)   AS column_names
FROM   (SELECT table_name,
               (SELECT column_name + ',' AS [text()]
                FROM   information_schema.columns AS internal
                WHERE  internal.table_name = table_names.table_name
                FOR xml PATH ('')
               ) AS column_names
        FROM   (SELECT   table_name
                FROM     information_schema.columns
                GROUP BY table_name) AS table_names) AS pre_trimmed;
Second version (admittedly inspired by this post, which I stumbled on after writing the first version):
SELECT table_name,
       LEFT(column_names,LEN(column_names) - 1)   AS column_names
FROM   information_schema.columns AS extern
       CROSS APPLY (SELECT column_name + ','
                    FROM   information_schema.columns AS intern
                    WHERE  extern.table_name = intern.table_name
                    FOR XML PATH('')
                   ) pre_trimmed (column_names)
GROUP
 BY table_name,column_names;
The second CROSS APPLY version is supposedly slower according to the execution plan, but I didn’t conduct any benchmarks at all.

I hope this saves someone several hours of work :)

9 comments:

  1. Thanks for this, I just tried to integrate this into an SQL Server application I am working on. I will do some benchmarks next week to see if this actually improves performance.

    Briefly what I am doing is the following:
    I have a search interface that can return a few upto a couple thousand results. In practice I am only interested in displaying upto 50 (actually the user can also ask to get upto 1000 returned if he insists), however I am always expected to show a count of all possible results (total as well as grouped by one criteria).

    The solution we found gave us the best performance was to just fetch the id's and to throw that into an IN statement that fetches the actual data to display (and then again in a second query that fetches the "reason" why something was included since we are searching way more fields than are displayed in the end).

    Now my obvious worry is that I do not really want to fetch 1000 rows one at a time in the worst case. Then again it might not matter, since PHP will by default fetch the entire result set anyways and will just buffer them in the client. But its still an interesting excurs into some more advanced SQL Server constructs.

    ReplyDelete
  2. Thanks for the tip.

    ReplyDelete
  3. thank you very much, worked like a charm

    ReplyDelete
  4. you need to account for no values -



    SELECT table_name,
    case when LEN(column_names) > 0
    then LEFT(column_names,LEN(column_names) - 1)
    else '' end AS column_names
    FROM (SELECT table_name,
    (SELECT column_name + ',' AS [text()]
    FROM information_schema.columns AS internal
    WHERE internal.table_name = table_names.table_name
    FOR xml PATH ('')
    ) AS column_names
    FROM (SELECT table_name
    FROM information_schema.columns
    GROUP BY table_name) AS table_names) AS pre_trimmed;

    ReplyDelete
  5. How can i group_concat two columns using cross apply?

    For example,

    select group_concat(DATA_TYPE) as data_types, group_concat(column_name) as column_names from information_schema.columns group by table_name

    your help is much appreciated.

    ReplyDelete
  6. DECLARE @x varchar(8000)
    SELECT @x = COALESCE(@x+', ','')+ISNULL(GROUPNAME,'') FROM [whims2].[dbo].[UM_TGROUP]
    SELECT @x

    ReplyDelete
  7. It works but it causes some issues.
    The characters &, <, and > are always entitized to &, <, and > respectively.

    How do we avoid it?

    Please reply

    ReplyDelete
  8. I recently found this : http://groupconcat.codeplex.com/
    Is is a reimplementation of GROUP CONCAT, as simple as the original one (no need of subrequest, partition handling, etc). Works great for me, but needs SQLCLR support (it's a SQLCLR UDAF).

    ReplyDelete
  9. Without using Value with XML-PATH might lead to parsing errors for XML encoded values. Please find my blog covering scenarios for grouped concatenation in SQL server https://blog.vcillusion.co.in/understanding-the-grouped-concatenation-sql-server/

    ReplyDelete