Build and use custom Sensitive Information Types in Office 365

Reading Time: 5 minutes

Office 365 comes with 87 built-in sensitive information types. They can be used to identify and take action in several places: Data Loss Prevention (DLP), Azure Information Protection (AIP) Labels, and Retention Classification Labels. Examples of sensitive information types are:

Canada Bank Account Number
Australia Driver’s License Number
Credit Card Number
U.S. Social Security Number (SSN)

You can also create your own custom sensitive information types to detect organization-specific content for security and compliance reasons. This post will walk thru an example of creating a custom type for an internally-formatted customer number and then using it in a DLP policy and a Retention label and a modified technique for detecting it in an AIP label. (see note below)

Note: Currently, AIP doesn’t support custom sensitive information types. You can use either the built-in sensitive information types or create a custom Regular Expression in the Azure Portal. Supporting custom sensitive information types is on the backlog and will come as part of the AIP and Security & Compliance Center Retention labels.

The customer number in this example is in the format AA-##### where AA is any uppercase alphabetic character indicating the customer type and ##### is the 5-digit customer number.

Examples: EF-123545, JK-97812, DG-93809

Build the Regular Expression

We’ll use a regular expression to detect the exact customer # format. If you’re new to regular expressions, they can be a bit intimidating, however I’ve discovered a great website to help with not only building expressions, but also validating them. Check it out: https://regex101.com.

The regular expression to detect a valid customer # in this example is this:

\s[A-Z]{2}-[0-9]{5}\s

Note: In real-world, the first 2 characters of the customer number may be limited to specific combinations of letters; if this was the case, that would need to be built into the regular expression.

Build the Custom Sensitive Information Type

This is done in the Office 365 Security & Compliance Center within the Classifications section by selecting the Custom sensitive information types menu option. When you create one, provide it a descriptive name and description:

On the definition details, I’ll select a confidence level of 80 on the pattern and then I’ll select Regular expression to define the primary element:

Two pattern elements will be added: the regular expression as the primary element, and the keyword “Customer” as the secondary element. This means if the keyword “Customer” is detected within 300 characters of where there was a match on the regular expression, it will be a positive match. The keyword is optional.

Test the Sensitive Information Type

When you save the custom type, you’ll be prompted to test it. Great idea!! Testing your pattern is done using text files – create a series of text files seeded with customer numbers to test your pattern elements. Make sure you also test ones that don’t have any customer #s in them!

Here are the sample documents I uploaded and my expectation for each one written as part of the text and shown with a check mark or x:

Here are the results for all 6 of the test files. Each one passed/failed as expected:

This slideshow requires JavaScript.

Once you’re satisfied with the test case results, you can now use it to auto-apply controls using different features:

Retention label
DLP policy
AIP label (with regular expression)

Let’s do it!

Create an Auto-applied Retention Label

I’ve created a new classification label called Customer Info to be retained for 2 years before being deleted. I select the auto-apply button and the custom sensitive information type created in the first part of this post to automatically detect content that matches.

Note: it can take up to 7 days for the label to be auto-applied. Mine took 2 days.

This slideshow requires JavaScript.

To test out the auto-apply feature, I’ve created a Modern Team site with 6 documents in it, each one with the same content as was tested in the text files on the first part of this post. This will make it easy to verify if the auto-apply feature is correctly applying the retention label since I will expect documents 1 and 5 to have the label auto-applied.

Below is the “before and after” views of the document library showing which documents had the retention label applied: Sample document #1 and #5! Awesome!

ProjectDocumentsBeforeAutoApply — Before the auto-apply

RetentionAutoApplied — After the auto-apply

Create a DLP Policy to auto-detect a Customer #

I created a DLP policy to detect any content matching the custom sensitive information type defined earlier in the post. Similar to the retention label, I selected the custom type and then set the policy to detect when it was shared external to the organization when at least 1 of them was found:

This slideshow requires JavaScript.

Let’s browse to the same test document library created for the Retention label test and try sharing 2 documents, 1 with a customer # and 1 without. As you can see in the image, Sample document 1 detects the sensitive customer information in the document and prevents it from being shared outside the organization. Sample document 2, on the other hand, can be freely shared.

This slideshow requires JavaScript.

Azure Information Protection Label

Even though the capability to use the custom sensitive information type is not currently supported in an AIP label condition, we can still use the same regular expression to either automatically set or recommend a label. For consistency sake, let’s do that!

I created a new AIP label called Customer in the Global AIP policy. I added some visual markings (header, watermark), applied protection so it couldn’t be copied or printed and added a condition called AIP Corp Customer Number using the same regular expression as in the custom sensitive information type. If the condition was met, I recommend the Customer label be applied.

This slideshow requires JavaScript.

To test, I created a Word document and entered a customer number. Here’s a quick video to show how the AIP engine auto-detects the customer # and then recommends the Customer label based on the condition we set:

Sweet!

My thoughts

This is good functionality to detect any custom information you need to apply security and protection controls on. The biggest challenge is to define the correct custom patterns to be able to accurately auto-detect the content that needs it.

I’m anticipating the unification of AIP and Classification labels where we’ll be able to also leverage custom sensitive information types with AIP controls as well.

Thanks for reading.

-JCK

11 comments

Russ Herald says:

August 31, 2018 at 5:06 AM

Very Nice! I’m just putting together a presentation for our Compliance department, and your post lays out the steps very succinctly. I’ve been dealing with Microsoft’s solutions for a while now, but still find that them using the same language again and again in various products makes understanding the products more difficult. But this is perhaps the inevitable result of different teams working parallel on different aspects (retention or protection, for example).

Idea for a follow-up: Monitoring defined labels (or link to other posts that handle this)

Loading...

1. Joanne Klein says:
  
  August 31, 2018 at 7:50 PM
  
  Thanks for the feedback Russ! That’s a great idea for a follow-up post.
  -JCK
  
  Loading...
  
Longwing says:

October 5, 2018 at 7:22 AM

Thank you for writing this guide, it’s the clearest I’ve found while trying to create a custom sensitive information type. However, I’m running in to a roadblock. In my O365 Security and Compliance center, there isn’t an option for creating a new sensitve information type. Under Classifications, I have Labels, Label Policies, and Sensitive Information Types.

Selecting Sensitive Information Types lets me see all of Microsoft’s published types, but doesn’t let me create a new one.

All the resources I’m reading online say I have to build a custom XML file myself, then upload it through powershell. I’m learning the formatting requirements now, but I was really hoping I could use the GUI you show here, and just drop my Regex into a box.

Loading...

1. Joanne Klein says:
  
  October 19, 2018 at 2:28 AM
  
  Hi Longwing,
  I definitely used the GUI to do this. When I go into the Security & Compliance Center now, I no longer see the ‘Custom Sensitive Information Type’ on the menu, however, when I click the ‘Sensitive Information Types’ menu option, I have a Create button at the top of the page that will allow me to add a custom one.
  
  Do you see that? If not, I would log a ticket with Microsoft.
  
  -JCK
  
  Loading...
  
Pingback: Selectively prevent and secure content from External sharing using Labels and DLP policies in Office 365 – Kloud Blog
sedgley68Mark Simpson says:

June 18, 2019 at 7:11 AM

Good Afternoon JK. I am probably way of track here, but i have a questions. I am fairly new to AIP and I am busy rolling this out to my Enterprise. I have DLP running smoothly in the background, but when i apply DNF (Do Not Forward), DLP will no longer scan my documents. Are you able to have both selected. I tried but as soon as I select protect on the Blade DNF is enabled but DLP wont scan. Any ideas?

Regards, Mark Simpson

Loading...

1. Joanne Klein says:
  
  June 18, 2019 at 11:40 PM
  
  Hi Mark, you selected ‘Protect’ to get to the DNF setting correct? I know currently encrypted documents can’t be searched which then also means DLP can’t scan them. This is likely why.
  Joanne
  
  Loading...
  
bigpix2000 (@bigpix2000) says:

October 22, 2019 at 3:01 PM

Hi Joanne! This thread is still very timely and a thought came from this from a client who is wondering if they can “build” on existing types. To do that, they feel like they need to see the original REGEX statements that Microsoft is probably using under the covers. The docs show what they are doing but in a less technical way. Do you know where we might find the REGEX pieces behind that? (I am suspecting that there is a place on the web for the various consortiums that have them but that would be spread out e.g. for GDPR stuff or PCI in the US etc.) Thanks in advance!

Loading...

1. Joanne Klein says:
  
  October 22, 2019 at 6:25 PM
  
  Hi! I don’t know where the regex would be (or if its hidden by design – not sure) but you can download the current xml definitions and that may help to some degree: https://docs.microsoft.com/en-us/microsoft-365/compliance/customize-a-built-in-sensitive-information-type
  
  Loading...
  
Pingback: Lead4Pass SC-400 dumps with PDF and VCE are the best practice solution for the exam
Sergio Londono says:

October 17, 2024 at 2:41 PM

Hello Joanne,

I noticed the OOB SITs for Canada are good for english, did you have the opportunity to create custom SITs for Canada French?

i.e.: RAMQ, driver license quebec, Ontario Driver license, etc

I am in process to create those custom SITs, however, the idea is not invent the wheel.

Loading...