Question

We have a problem with a managed metadata field on our Site Collection (Modern SharePoint Online) when some term tags appear with a hash character in front and some are not. Is there any way to remove these hashes? Now it looks like this: enter image description here

Some of terms are reused, but they are not matching to those which have hashes, also our Hashtags and Keywords Term Sets are almost empty and do not have similar values

enter image description here

There are maybe some other sites on the tenant which may have the same values for term labels on site (field) level, but they definitely do not use the same global Term Group. There are NO other Terms in any other Term Group which have matching names on the tenant level.

Any ideas how to remove hashes from the field, or at least add them to all records so we have a consistent view?

Thanks, Alexandr


Added later:

The problem is more complex, those values with hashes are not indexed by search in ows_taxid_ property (which has terms ids) so I have 4 selected in the field, and only 2 without hashes appear in search results (as the property value). However all values are there for ows_ property (which has terms labels). Looks like both parts of the question are linked. Can someone help with this as well?

Was it helpful?

Solution

I figured this out. The problem is related to TaxonomyHiddenList.

Those terms initially were copied from an Excel spreadsheet and had invisible character at the end which was visible only in SP List interface, not in Term Store Editor. The character was removed next day and everyone forgot about the issue. However all of this had some unexpected consequences.

First of all, when you try to use a term in any field SP creates a record in TaxonomyHiddenList to cache the value. If there is something wrong with the term (like special unicode character in our case) SP creates a "broken" (or invalid) record which sometimes does not have any Title and always has UNVALIDATED as the value in CatchAllData field. Moreover, the record has empty CatchAllDataLabel field and all Term<LANGUAGE_CODE_HERE> have hash # character in front of the values.

Unfortunately, it looks like if the record has a "broken" state when you fix the term in your Term Store the synchronization TimerJob never synchronize the term again for some reason (in our case 2 weeks time).

How does this affect search index? Okay, looks like search is using CatchAllData value to build ows_taxid_... property value as the field contains full path to the term as a | separated Base64 encoded Term, all Term parents, TermSet and TermStore IDs. When CatchAllData is set to UNVALIDATED crawler cannot decode the path correctly and the property stays blank.

How to fix this? I have not found the simple way, at first I tried to remove a record from the hidden list, in result the value disappeared from all documents, so data was lost forever, fortunately I did this test on Dev environment. After all the only way I found is to write a PS script to rebuild/recover all the "broken" records.

Overall, it looks like the problem is quite rare, but if someone needs help I published the script here. Be careful with it as it can break you Site Collection, so test on Dev or QA environment before applying on any production Site Collection. DO NOT RUN IT IF YOU DO NOT COMPLETELY UNDERSTAND WHAT THE SCRIPT CAN CHANGE IN YOUR SYSTEM AND WHAT SIDE EFFECTS IT CAN CAUSE. Potentially you can adjust the scrip to use it as real time synchronization mechanism if you need rename a term quickly without waiting for OOTB TimerJob (I do not recommend this).

Unfortunately I was not able to find anything in MS Documentation describing that hash character may mean a broken synchronization, so this answer is base only on my experience. Please comment if you have any document which describes this behavior.

Licensed under: CC-BY-SA with attribution
Not affiliated with sharepoint.stackexchange
scroll top