Question

I am preparing a data visualization in Tableau. I have some data that can be simplified like this:

Name, Score, Tag
Joe, 5, A;B
Phil, 7, D
Quinn, 9, A;C
Bill, 3, A;B;C

I would like to generate a word cloud on the Tag field that counts occurances of each item A,B,C. So I need to generate this:

A,3
B,2
C,2
D,1

In other words, I need help working with a field that contains a list of delimited values. In the example data ; is the delimiter, but it could be anything. I would like the word cloud to update as the user applies filters, e.g. dragging a slider to set score > 5. So the tag count has to be done on the fly.

I'm pretty sure I'll need to use field calculations and table calculations..? Possibly I'll need to have a separate table tracking the tags..?

I have no problem building the word cloud and other viz elements. What I'm looking for help with is parsing the delimited list field and calculating the tag counts.

I do have full control over the source data, so if there is an easier way to do this by reorganizing the schema, I'd be glad to do that. I thought of breaking the field up into spearate tag1, tag2, tagX fields and trying to count over the separate fields... but not sure if this is any simpler.

Thanks for any tips.

Was it helpful?

Solution

Another (probably better in your case) approach is to reshape the data before feeding it to Tableau. Tableau works best with normalized data.

Preprocess it to look like:

Name, Score, Tag
Joe, 5, A
Joe, 5, B
Phil, 7, D
Quinn, 9, A
Quinn, 9, C
Bill, 3, A
Bill, 3, B
Bill, 3, C

At that point, the standard Tableau word cloud charts should work well, and it will scale easily as you add more tags and data.

Reshaping data to normalize it prior to analysis with Tableau is a pretty standard step. Sometimes you can do it automatically, say with custom SQL, but often you'll have to use some sort of script first. If your data comes from Excel, Tableau has a plug in that can help with reshaping data. Look for it on the Tableau knowledge base.

OTHER TIPS

Here's an approach that would be tolerable if you had a fixed set of 3 or 4 tags. Since you have closer to 50K possible tags, it's not a feasible approach for your problem as is. But maybe it will give you an idea. Similar approaches can be used to solve different kinds of problems in Tableau, so its a useful trick to know.

For each tag, create a boolean calculated field that returns 1 if the current row contains that particular tag and null otherwise (or whatever the condition is you want to detail)

For example, define a calculated field called Tag_A defined as:

if contains(Tag, "A") then
  1
end

Similar, define calculated fields Tag_B, Tag_C etc

So far it's easy. Then you can use those fields in other calculations to count the number of records that contain tag A, filter to only those that contain A, use the calculated field on the condition tab when defining sets that are computed dynamically by a formula ... Of course, the low level calculated field function can be more complex, say checking for the presence of at least 2 fields out of a list for example.

If nothing else, this approach sometimes lets you break complex problems into bite sized pieces.

Unfortunately, hard coding calculated field names won't scale to 50K tags. For that, you probably want to reshape your data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top