Managed Metadata Superpowers and SharePoint Premium Document Processing

Reading Time: 4 minutes

[Update November 2023] Microsoft Syntex has been rebranded as SharePoint Premium. Document Processing models are one of the many features included with SharePoint Premium.

If you’re new to SharePoint information architecture and the managed metadata service and you want to build a SharePoint Premium document processing model, this post is for you! If you’re already familiar with SharePoint information architecture, then read on to see how SharePoint Premium can leverage a managed metadata’s superpowers.

Note: let’s level set on terminology. In a SharePoint Premium unstructured document processing model, an entity extractor is what extracts values from a document and equates to a column in SharePoint. An entity extractor can either reference a pre-existing site column or automatically create one for you. If you want to use a managed metadata column however, you must pre-create it and reference it when creating the entity extractor. As of the time of this writing, only a SharePoint Premium unstructured document processing model can use a managed metadata column. A SharePoint Premium structured document processing model cannot.

Whenever I demonstrate an unstructured document processing model, I always spend time talking about managed metadata and the tenant level term store. I do this because a managed metadata column is fully supported in an unstructured document processing model and has some unique superpowers that other column data types don’t have when it comes to extracting values from a document.

This post will share some super-powers they have with SharePoint Premium. Decide if these will help you before building your own model and selecting the data types of your entity extractors.

Superpower list

  1. Superpower 1: Control values extracted
  2. Superpower 2: Allow for variations in values extracted
  3. Superpower 3: Allow for new terms automatically
  4. Superpower 4: Use the taxonomy tagger with your term sets

Check out the details below…


Superpower 1: Control the values extracted

You can pre-create a managed metadata column pointing to a closed tenant-level term set and then reference it when creating a model’s entity extractor. This will ensure the values extracted from a document will always match one of the terms rather than being some random string value. This provides consistency in your metadata which helps in follow-on workflows, search, grouping, and filtering.

Use-case: A rental agreement unstructured model for a property management company where the rental categories reference a closed term set:

The terms are defined in advance so the values extracted must match one of the three pre-defined term values below:


Superpower 2: Allow for variations in the values extracted

If you have term synonyms defined in the tenant-level term set, the SharePoint Premium model is smart enough to match to any of the synonym values as well! This provides consistency in your metadata which helps in follow-on workflows, search, grouping, and filtering.

Use-case: A statement of work model where the services provided are being extracted. If the term Training has synonyms Education, Guidance and Presentation defined, then the model would be able to match any of the synonyms in a document and store the value Training in the column instead.


Superpower 3: Allow for new terms automatically

A SharePoint Premium unstructured document processing model understands open term sets. A term set is closed by default which means an administrator is in control of all term values added to the term set and a term must be present in the term set before an entity extractor can use the value (superpower 1). Alternatively, a term set can be open. This will automatically add new terms to the term set if a value other than an existing term value is extracted from a document. This will ensure the model will still populate the metadata column with the new term and add it to the term set even if the term didn’t previously exist.

Use-case: Client names extracted from an invoice SharePoint Premium model if all client names aren’t pre-created in the open term set. The Fake client term set below is open and will therefore allow values other than those that are pre-created to be added to the term set when the model extracts the value:


Superpower 4: Taxonomy Tagging

This feature automatically tags documents in SharePoint libraries with terms configured in your term store using artificial intelligence.

You can add up to 3 managed metadata columns to your library and configure them for taxonomy tagging. If a term (tag) is found in the content, the tags will be automatically stored in the column. Brilliant!

You can read more about taxonomy tagging at https://learn.microsoft.com/microsoft-365/syntex/taxonomy-tagging-overview and https://wbaer.net/2023/12/a-quick-trip-around-taxonomy-tagging/.


Managed Metadata Ideas

Are you curious about ideas for using managed metadata in SharePoint Premium models? Based on some common use-case scenarios, here are some ideas:

Refer to these reference links for supporting details:

Thanks for reading.

-JCK

One comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.