Question

I'm stuck on an aggregation problem that I can't get to the bottom of.

I have some data which is best summarized as follows

id |phraseId|seqNum|word
=========================
1  |1       |1     |hello
2  |1       |2     |world
3  |2       |1     |black
4  |2       |2     |and
5  |2       |3     |white

I'd like a query that gives back the following data:

phraseId|completePhrase
========================
1       |hello world
2       |black and white

Anyone?

EDIT

I notice all the provided solutions use FOR XML PATH. What is this magic?

Was it helpful?

Solution

One solution is to create an UDF using an FOR XML PATH expression.

  • the UDF handles the concatenation of one PhraseID
  • it can be used in a normal select

SQL Statement

SELECT  PhraseID, dbo.UDF_ConcatWord(PhraseID)
FROM    Phrases
GROUP BY PhraseID

Creating the UDF

CREATE FUNCTION dbo.UDF_ConcatWord(@phraseID INT) RETURNS VARCHAR(8000) AS
BEGIN  
  DECLARE @r VARCHAR(8000)
  SELECT @r = (
    SELECT  word + ', '
    FROM    Phrases
    WHERE   phraseID = @phraseID
    FOR XML PATH('')
  )
  IF LEN(@r) > 0 SET @r = SUBSTRING(@r, 1, LEN(@r)-1)
  RETURN @r
END
GO

edit

After revising some of the links myself, an even shorter solution is

SQL Statement

SELECT  DISTINCT p1.PhraseID
        , STUFF(( SELECT  ' ' + p2.word 
                  FROM    Phrases AS p2 
                  WHERE   p2.PhraseID = p1.PhraseID 
                  FOR XML PATH('')), 1, 1, '') AS completePhrase
FROM      Phrases AS p1
ORDER BY  p1.PhraseID

OTHER TIPS

try this:

DECLARE @TableA  table (RowID int, phraseId varchar(5),seqNum int, word varchar(5))

INSERT INTO @TableA VALUES (1,1,1,'hello')
INSERT INTO @TableA VALUES (2,1,2,'world')
INSERT INTO @TableA VALUES (3,2,1,'black')
INSERT INTO @TableA VALUES (4,2,2,'and')
INSERT INTO @TableA VALUES (5,2,3,'white')

SELECT
    c1.phraseId
        ,STUFF(
                 (SELECT
                      ' ' + word
                      FROM @TableA  c2
                      WHERE c2.phraseId=c1.phraseId
                      ORDER BY c1.phraseId, seqNum
                      FOR XML PATH('') 
                 )
                 ,1,1, ''
              ) AS CombinedValue
    FROM @TableA c1
    GROUP BY c1.phraseId
    ORDER BY c1.phraseId

OUTPUT:

phraseId CombinedValue
-------- --------------------------
1        hello world
2        black and white

(2 row(s) affected)

I have cheated a bit by assuming that you have a table which holds the header record for each phrase. If this is missing, you could construct it by selecting a distinct list of phraseIDs from the table containing the words:

declare @words table
(id int
,phraseId int
,seqNum int
,word varchar(10)
)

insert @words
select 1,1,1,'hello'
union select 2,1,2,'world'
union select 3,2,1,'black'
union select 4,2,2,'and'
union select 5,2,4,'white'

declare @phrase table
(phraseId int)

insert @phrase
select 1
union select 2

select phraseID
       ,phraseText AS completePhrase
FROM @phrase AS p
CROSS APPLY (select word + ' ' as [text()]
             from @words AS w
             where w.phraseID = p.phraseID
             for xml path('')
            ) as phrases (phraseText)

In the end I used Lieven's second answer, but found that for certain string combinations, the FOR XML PATH('') trick causes problems to occur:

declare @phrases table
(
    id int
    ,phraseId int
    ,seqNum int
    ,word varchar(10)
)

insert 
    @phrases 
values
    (1,1,1,'hello'),
    (2,1,2,'world'),
    (3,2,1,'black'),
    (4,2,2,'and'),
    (5,2,3,'white')

SELECT  
    DISTINCT p1.PhraseID, 
    STUFF(
        ( 
            SELECT  
                ' ' + p2.word 
            FROM    
                @phrases AS p2 
            WHERE   
                p2.PhraseID = p1.PhraseID 
            FOR XML PATH('')
        ), 1, 1, '') AS completePhrase
FROM      
    @phrases AS p1
ORDER BY  
    p1.PhraseID

works fine, but if the example uses characters that would need escaping were they used in an XML, problems occur. For instance, running the following data through it:

insert 
    @words 
values
    (1,1,1,'hello>'), --notice the less than symbol
    (2,1,2,'world'),
    (3,2,1,'black')

Gives

hello> world

and also if the source table is declared out of order, an order by is required

A small mod to the original query fixes all:

SELECT  
    DISTINCT p1.PhraseID, 
    STUFF(
        ( 
            SELECT  
                ' ' + p2.word 
            FROM    
                @words AS p2 
            WHERE   
                p2.PhraseID = p1.PhraseID 
            ORDER BY
                p2.seqNum  --required
            FOR XML PATH(''),TYPE
        ).value('.','nvarchar(4000)'), 
        1, 
        1, 
        ''
    ) AS completePhrase
FROM      
    @words AS p1
ORDER BY  
    p1.PhraseID

(see FOR XML PATH(''): Escaping "special" characters)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top