Using Regex for Azure Information Protection Labels

Reading Time: 3 minutes

When configuring Azure Information Protection (AIP) labels, one of the options is to either automatically set or recommend a label based on content found within a document. To do this, you use sensitive information types (Financial, Medical, and Privacy types), keywords within a document or, for more advanced content matching, Regular Expressions.

Regular expressions, as cryptic as they can sometimes be, are a powerful mechanism for matching content. Although they can be used to pattern-match for many things (health information, finance information, customer information, phone #s, email addresses, etc.), this post will walk thru a practical example of matching on words you would typically find in a legal document. If found, it will recommend an AIP label.

Shout out to Vincent Biret  for helping me with the Regular Expression syntax! 🙂

Tip: Ask subject matter experts in your organization for common terms to search for. For example, the legal department will likely be able to come up with the terms you should include in a regular expression to classify a lot of the legal content in your organization automatically.

Here are the steps to set this up…


Add AIP label

Add a label in the Azure portal. For this post, I created a label called Joanne. 🙂


Add Label Condition

There are a number of label settings, but the one I’m focusing on in this post is the condition. Scroll down to the section where you add the condition to set your label and click Add a new condition. I’ve added a condition called Regex condition for label ‘Joanne’:

Add a condition

Configure a custom condition and enter the regular expression to match any of the words you’re searching for in the content of your document. For the example in this post, we want to look for any of these words: RFP, RFI, WHEREAS, Contract, Notwithstanding to indicate it is likely a legal document (or at least one we want to apply the Joanne AIP label to).

The regular expression to do this is:

.*(RFP|RFI|WHEREAS|Contract|Notwithstanding)+.*

2 important settings:

  • Ensure you toggle the Match as a regular expression option to the On position so the AIP engine will interpret the pattern as a regular expression and not an exact string phrase
  • The pattern-match should be case sensitive so ‘whereas’ won’t match, but ‘WHEREAS’ will since the capitalized version is common language found in a legal document. To do that, either enter the string (?-i) in front of the regular expression or (a simpler technique), toggle the Match with case sensitivity setting.

Below are the settings for this condition:

Regex Condition


Automatic or Recommended?

Once you’ve added the condition, select how the label should be applied: automatic or recommended. I like to start with ‘recommended’ to test out the condition and switch to automatic at a later time. For this example, I’ve chosen Recommended.

AutoOrRecommended


End Result

While working in the Office clients, the conditions are checked continuously in the background for a match on the regex pattern. If any of the words are found within the content, it will recommend the Joanne label!

WordExample

 


Summary

I’m the first to admit that regular expressions can be very confusing to understand however they’re a powerful addition to your technology toolkit. Don’t hesitate to leverage them in your conditions to automatically set or recommend an AIP label.

Thanks for reading!

-JCK


Credit: Photo by Steinar Engeland on Unsplash

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.