Question

I’m working on creating a report that has 3 columns:

Search Term, Count of Search Term, Is Search Term a Tag?

Here is sample output:

Search Phrase   Count   Is Tag
V7000           507     Yes
SPSS            370     No
unica           282     Yes

I can accomplish the 1st 2 requests without too much fuss. It’s the comparing search term to tag that is causing me grief. A search term is a tag if the entire phrase is present in the tag name. I have a few examples below.

Examples for determing if a search term is a tag:

Search Phrase       Tag             Is Tag?
CLM Dashboard       CLM             No
Dashboards          Dashboard       Yes
Dashboard           CLM Dashboard`  Yes
unica               Communication   Yes
XX                  XX Cool Name    Yes
ABC Tech Company    ABC             No 

My plan of attack is to do a left join where search_term like tag_name

There are a few issues though:

Issue1: Over 100 rows are returning (even though I have TOP (100)) list in my code.
Currently it is displaying two rows for the search term Japanese because there are two tags, Japan, and Japanese. I don't care which tag the search term matches, only if there is a match. I think this is directly related to my Left Join like clause.

Issue2: “ABC Tech Company” should fail.
Currently ABC Tech Company is displaying "yes" when it should fail. This is because the ‘ABC’ in “ABC Tech Company” appears in the tag name “ABC”. I want this to return null because the entire phrase “ABC Tech Company” is not in any tag

Issue3: “unica” should match because that phrase is inside of the “Communication” tag. “XX” should match “XX Cool Name” tag.
Currently unica isn't matching (nor is XX). I have no idea why this isn’t working

Overall, my issues appears to be stemming from not fully implementing my LEFT JOIN LIKE situation correctly.

Here’s what I have right now:

DECLARE @startDate DATETIME2 = '2013-01-01'
DECLARE @endDate DATETIME2 = '2013-08-22'

;WITH Top100Searches AS
(
    Select TOP (100)
        SearchDim.criteria_keywords as SearchTerm, COUNT(SearchDim.created_date) as SearchPhraseCount
    From
        search_dimension SearchDim
    Where
        SearchDim.created_date > @startDate
        and SearchDim.created_date < @endDate
    Group by
        SearchDim.criteria_keywords
    Order by
        SearchPhraseCount DESC
)

Select
    Top100Searches.SearchTerm, Top100Searches.SearchPhraseCount,CASE WHEN TagDim.name IS NOT NULL THEN 'Yes' ELSE 'No' END AS 'Is Tag?'
From
    Top100Searches
LEFT JOIN
    (
        Select TagDim.name
        From tag_dimension TagDim
    ) as TagDim on 
    --This is where I'm having trouble
    Top100Searches.SearchTerm like '%' + TagDim.name + '%'
Order by
    Top100Searches.SearchPhraseCount DESC

Let me know if I can clarify anything, I've done my best to clearly explain everything.

Was it helpful?

Solution 2

I actually figured out my own solution.

There were two things:
1.) First I was using the LIKE on my LEFT JOIN incorrectly. In laymans terms, I was saying "Is 'Communication' inside of 'unica'?" Instead I wanted the opposite, so swapping my join criteria resolved issues 2 and 3.


2.) Once I made that change, there were duplicates now based on how many the search term applying to many tags. While I do my best to never use DISTINCT because it usually means that you've written your query wrong, there was no way around it this time. So adding DISTINCT got me back to displaying 100 rows as intended, resolving issue 1.

To sum up, my LEFT JOIN conditions were reversed and I needed to add DISTINCT to get my query to work.

I hope this helps someone else out there!

Select DISTINCT --#adding Distinct is change #2
    Top100Searches.SearchTerm, Top100Searches.SearchPhraseCount,CASE WHEN TagDim.name IS NOT NULL THEN 'Yes' ELSE 'No' END AS 'Is Tag?'
From
    Top100Searches
LEFT JOIN
    (
        Select TagDim.name
        From tag_dimension TagDim
    ) as TagDim on
    --This is change #1
    TagDim.name LIKE '%' + Top100Searches.SearchTerm + '%'
Order by
    Top100Searches.SearchPhraseCount DESC

OTHER TIPS

Here is the answer to Issue 3, though it's a horrible way to search terms. You should really consider (if possible) doing this in C# and using RegEx.

DECLARE @JUNKYWORDS TABLE
( 
 junkWord VARCHAR(400)
)

DECLARE @TAGS TABLE
(
TAGNAME VARCHAR(30),
YESORNO VARCHAR(1)
)
INSERT INTO @JUNKYWORDS VALUES ('I love buttons')
INSERT INTO @JUNKYWORDS VALUES ('I lost a button')
INSERT INTO @JUNKYWORDS VALUES ('pizza')

INSERT INTO @TAGS VALUES ('to','Y')
INSERT INTO @TAGS VALUES ('zz','N')

SELECT TAGNAME, YESORNO, COUNT(*)
FROM @JUNKYWORDS,@TAGS
WHERE JunkWord LIKE '%' + TAGNAME + '%'
GROUP BY TAGNAME, YESORNO
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top