Structured Document Processing Model with Microsoft Syntex

Reading Time: 3 minutes

As of March 2023, here are the types of custom Microsoft Syntex models you can build to classify documents in SharePoint (this post is about a structured document processing model):

Note: Microsoft’s pre-built models all use the Unstructured document processing teaching method.

Each Syntex model is suited to address different types of formatted content and file types. This post is not about that as it has been well-documented here: understand model differences.

I recently spent some time working with the Structured document processing model that has the ability to accommodate table structures found in documents. As a SharePoint practitioner from a ways back, I was curious how this would be manifested inside a SharePoint site. Here are some of the obvious questions I had:

  • How will the table entries be represented in the document library? Will they?
  • How will the table entries be stored on the SharePoint site?

To see how it worked, I created an order form with some fields in the body of the document and some tabular information embedded within. Below is an example of the PDF that was used to train the model with the fields and table I wanted to extract circled in yellow:

I created 5 variations of this order form to train the model and added them into 1 collection since they all shared the same layout.

Microsoft has done a great job at explaining how to teach your AI Builder model to extract fields and table rows/columns inside of a document so I won’t repeat their instructions. Link: Tag documents

What’s it look like in SharePoint?

During the publish process, you will be prompted to either create a new list or update an existing list with the table information extracted from the document. In my example, I created a new list. The list is on the same SharePoint site as the library and will link back to the library with the file ID property.

Continuing with the order example above, what does this look like in SharePoint?

I uploaded 6 order forms to the library:

Here’s what happens:

  • The AI model automatically runs against each one and extracts the fields as metadata values
  • The AI model stores all of the rows from the order table as items in the associated SharePoint list
  • The AI model automatically adds a link column to the SharePoint library:
    • I named the table “Reseller Orders” when building the AI model
    • By default, the link text will be View table info – I edited the text to be View orders

When I click the View orders link on the first document, it takes me to a filtered view of the associated list showing me only the items relating to that order:

Ah! Now it all makes sense. Using a structured document processing model is a dynamic way of retrieving 1:N rows of data out of a table and having them all stored as separate items in an associated SharePoint list.

Other observations:

  • You can create custom views on both the document library and the associated list and use custom formatting like any other library or list
  • You cannot currently use a managed metadata column data type when extracting information from a document using a structured document processing model. If you do this, a Single line of text column will be created for you instead.

Thanks for reading!


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.