Question

Just an exploratory question to see if anyone has done this or if, in fact it is at all possible.

We all know what a tag cloud is, and usually, a tag cloud is created by someone assigning tags. Is it possible, within the current features of SQL Server to create this automatically, maybe via trigger when a table has a record added or updated, by looking at the data within a certain column and getting popular words?

It is similar to this question: How can I get the most popular words in a table via mysql?. But, that is MySQL not MSSQL.

Thanks in advance. James

Was it helpful?

Solution

Here is a good bit on parsing delimited string into rows:
http://anyrest.wordpress.com/2010/08/13/converting-parsing-delimited-string-column-in-sql-to-rows/

http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

T-SQL: Opposite to string concatenation - how to split string into multiple records

If you want to parse all words, you can use the space ' ' as your delimiter, Then you get a row for each word.

Next you would simply select the result set GROUPing by the word and aggregating the COUNT

Order your results and you're there.

OTHER TIPS

IMO, the design approach is what makes this difficult. Just because you allow users to assign tags does not mean the tags must be stored as a single delimited list of words. You can normalize the structure into something like:

Create Table Posts ( Id ... not null primary key )
Create Table Tags( Id ... not null primary key, Name ... not null Unique )
Create Table PostTags
    ( PostId ... not null References Posts( Id )
    , TagId ... not null References Tags( Id ) )

Now your question becomes trivial:

Select T.Id, T.Name, Count(*) As TagCount
From PostTags As PT
    Join Tags As T
        On T.Id = PT.TagId
Group By T.Id, T.Name
Order By Count(*) Desc

If you insist on storing tags as delimited values, then only solution is to split the values on their delimiter by writing a custom Split function and then do your count. At the bottom is an example of a Split function. With it your query would look something like (using a comma delimiter):

Select Tag.Value, Count(*) As TagCount
From Posts As P
    Cross Apply dbo.Split( P.Tags, ',' ) As Tag
Group By Tag.Value
Order By Count(*) Desc

Split Function:

Create Function [dbo].[Split]
(   
    @DelimitedList nvarchar(max)
    , @Delimiter nvarchar(2) = ','
)
RETURNS TABLE 
AS
RETURN 
    (
    With CorrectedList As
        (
        Select Case When Left(@DelimitedList, DataLength(@Delimiter)/2) <> @Delimiter Then @Delimiter Else '' End
            + @DelimitedList
            + Case When Right(@DelimitedList, DataLength(@Delimiter)/2) <> @Delimiter Then @Delimiter Else '' End
            As List
            , DataLength(@Delimiter)/2 As DelimiterLen
        )
        , Numbers As 
        (
        Select TOP (Coalesce(Len(@DelimitedList),1)) Row_Number() Over ( Order By c1.object_id ) As Value
        From sys.objects As c1
            Cross Join sys.columns As c2
        )
    Select CharIndex(@Delimiter, CL.list, N.Value) + CL.DelimiterLen As Position
        , Substring (
                    CL.List
                    , CharIndex(@Delimiter, CL.list, N.Value) + CL.DelimiterLen     
                    , Case
                        When CharIndex(@Delimiter, CL.list, N.Value + 1)                            
                            - CharIndex(@Delimiter, CL.list, N.Value)
                            - CL.DelimiterLen < 0 Then Len(CL.List)
                        Else CharIndex(@Delimiter, CL.list, N.Value + 1)                            
                            - CharIndex(@Delimiter, CL.list, N.Value)
                            - CL.DelimiterLen
                        End
                    ) As Value
    From CorrectedList As CL
        Cross Join Numbers As N
    Where N.Value < Len(CL.List)
        And Substring(CL.List, N.Value, CL.DelimiterLen) = @Delimiter

)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top