TSQL - Querying a table column to pull out popular words for a tag cloud
-
07-02-2021 - |
Question
Just an exploratory question to see if anyone has done this or if, in fact it is at all possible.
We all know what a tag cloud is, and usually, a tag cloud is created by someone assigning tags. Is it possible, within the current features of SQL Server to create this automatically, maybe via trigger when a table has a record added or updated, by looking at the data within a certain column and getting popular words?
It is similar to this question: How can I get the most popular words in a table via mysql?. But, that is MySQL not MSSQL.
Thanks in advance. James
Solution
Here is a good bit on parsing delimited string into rows:
http://anyrest.wordpress.com/2010/08/13/converting-parsing-delimited-string-column-in-sql-to-rows/
http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648
T-SQL: Opposite to string concatenation - how to split string into multiple records
If you want to parse all words, you can use the space ' ' as your delimiter, Then you get a row for each word.
Next you would simply select the result set GROUP
ing by the word and aggregating the COUNT
Order your results and you're there.
OTHER TIPS
IMO, the design approach is what makes this difficult. Just because you allow users to assign tags does not mean the tags must be stored as a single delimited list of words. You can normalize the structure into something like:
Create Table Posts ( Id ... not null primary key )
Create Table Tags( Id ... not null primary key, Name ... not null Unique )
Create Table PostTags
( PostId ... not null References Posts( Id )
, TagId ... not null References Tags( Id ) )
Now your question becomes trivial:
Select T.Id, T.Name, Count(*) As TagCount
From PostTags As PT
Join Tags As T
On T.Id = PT.TagId
Group By T.Id, T.Name
Order By Count(*) Desc
If you insist on storing tags as delimited values, then only solution is to split the values on their delimiter by writing a custom Split function and then do your count. At the bottom is an example of a Split function. With it your query would look something like (using a comma delimiter):
Select Tag.Value, Count(*) As TagCount
From Posts As P
Cross Apply dbo.Split( P.Tags, ',' ) As Tag
Group By Tag.Value
Order By Count(*) Desc
Split Function:
Create Function [dbo].[Split]
(
@DelimitedList nvarchar(max)
, @Delimiter nvarchar(2) = ','
)
RETURNS TABLE
AS
RETURN
(
With CorrectedList As
(
Select Case When Left(@DelimitedList, DataLength(@Delimiter)/2) <> @Delimiter Then @Delimiter Else '' End
+ @DelimitedList
+ Case When Right(@DelimitedList, DataLength(@Delimiter)/2) <> @Delimiter Then @Delimiter Else '' End
As List
, DataLength(@Delimiter)/2 As DelimiterLen
)
, Numbers As
(
Select TOP (Coalesce(Len(@DelimitedList),1)) Row_Number() Over ( Order By c1.object_id ) As Value
From sys.objects As c1
Cross Join sys.columns As c2
)
Select CharIndex(@Delimiter, CL.list, N.Value) + CL.DelimiterLen As Position
, Substring (
CL.List
, CharIndex(@Delimiter, CL.list, N.Value) + CL.DelimiterLen
, Case
When CharIndex(@Delimiter, CL.list, N.Value + 1)
- CharIndex(@Delimiter, CL.list, N.Value)
- CL.DelimiterLen < 0 Then Len(CL.List)
Else CharIndex(@Delimiter, CL.list, N.Value + 1)
- CharIndex(@Delimiter, CL.list, N.Value)
- CL.DelimiterLen
End
) As Value
From CorrectedList As CL
Cross Join Numbers As N
Where N.Value < Len(CL.List)
And Substring(CL.List, N.Value, CL.DelimiterLen) = @Delimiter
)