Office 365 comes with 87 built-in sensitive information types. They can be used to identify and take action in several places: Data Loss Prevention (DLP), Azure Information Protection (AIP) Labels, and Retention Classification Labels. Examples of sensitive information types are:
- Canada Bank Account Number
- Australia Driver’s License Number
- Credit Card Number
- U.S. Social Security Number (SSN)
You can also create your own custom sensitive information types to detect organization-specific content for security and compliance reasons. This post will walk thru an example of creating a custom type for an internally-formatted customer number and then using it in a DLP policy and a Retention label and a modified technique for detecting it in an AIP label. (see note below)
Note: Currently, AIP doesn’t support custom sensitive information types. You can use either the built-in sensitive information types or create a custom Regular Expression in the Azure Portal. Supporting custom sensitive information types is on the backlog and will come as part of the AIP and Security & Compliance Center Retention labels.
The customer number in this example is in the format AA-##### where AA is any uppercase alphabetic character indicating the customer type and ##### is the 5-digit customer number.
Examples: EF-123545, JK-97812, DG-93809
Build the Regular Expression
We’ll use a regular expression to detect the exact customer # format. If you’re new to regular expressions, they can be a bit intimidating, however I’ve discovered a great website to help with not only building expressions, but also validating them. Check it out: https://regex101.com.
The regular expression to detect a valid customer # in this example is this:
Note: In real-world, the first 2 characters of the customer number may be limited to specific combinations of letters; if this was the case, that would need to be built into the regular expression.
Build the Custom Sensitive Information Type
This is done in the Office 365 Security & Compliance Center within the Classifications section by selecting the Custom sensitive information types menu option. When you create one, provide it a descriptive name and description:
On the definition details, I’ll select a confidence level of 80 on the pattern and then I’ll select Regular expression to define the primary element:
Two pattern elements will be added: the regular expression as the primary element, and the keyword “Customer” as the secondary element. This means if the keyword “Customer” is detected within 300 characters of where there was a match on the regular expression, it will be a positive match. The keyword is optional.
Test the Sensitive Information Type
When you save the custom type, you’ll be prompted to test it. Great idea!! Testing your pattern is done using text files – create a series of text files seeded with customer numbers to test your pattern elements. Make sure you also test ones that don’t have any customer #s in them!
Here are the sample documents I uploaded and my expectation for each one written as part of the text and shown with a check mark or x:
Here are the results for all 6 of the test files. Each one passed/failed as expected:
Once you’re satisfied with the test case results, you can now use it to auto-apply controls using different features:
- Retention label
- DLP policy
- AIP label (with regular expression)
Let’s do it!
Create an Auto-applied Retention Label
I’ve created a new classification label called Customer Info to be retained for 2 years before being deleted. I select the auto-apply button and the custom sensitive information type created in the first part of this post to automatically detect content that matches.
Note: it can take up to 7 days for the label to be auto-applied. Mine took 2 days.
To test out the auto-apply feature, I’ve created a Modern Team site with 6 documents in it, each one with the same content as was tested in the text files on the first part of this post. This will make it easy to verify if the auto-apply feature is correctly applying the retention label since I will expect documents 1 and 5 to have the label auto-applied.
Below is the “before and after” views of the document library showing which documents had the retention label applied: Sample document #1 and #5! Awesome!
Create a DLP Policy to auto-detect a Customer #
I created a DLP policy to detect any content matching the custom sensitive information type defined earlier in the post. Similar to the retention label, I selected the custom type and then set the policy to detect when it was shared external to the organization when at least 1 of them was found:
Let’s browse to the same test document library created for the Retention label test and try sharing 2 documents, 1 with a customer # and 1 without. As you can see in the image, Sample document 1 detects the sensitive customer information in the document and prevents it from being shared outside the organization. Sample document 2, on the other hand, can be freely shared.
Azure Information Protection Label
Even though the capability to use the custom sensitive information type is not currently supported in an AIP label condition, we can still use the same regular expression to either automatically set or recommend a label. For consistency sake, let’s do that!
I created a new AIP label called Customer in the Global AIP policy. I added some visual markings (header, watermark), applied protection so it couldn’t be copied or printed and added a condition called AIP Corp Customer Number using the same regular expression as in the custom sensitive information type. If the condition was met, I recommend the Customer label be applied.
To test, I created a Word document and entered a customer number. Here’s a quick video to show how the AIP engine auto-detects the customer # and then recommends the Customer label based on the condition we set:
This is good functionality to detect any custom information you need to apply security and protection controls on. The biggest challenge is to define the correct custom patterns to be able to accurately auto-detect the content that needs it.
I’m anticipating the unification of AIP and Classification labels where we’ll be able to also leverage custom sensitive information types with AIP controls as well.
Thanks for reading.
Very Nice! I’m just putting together a presentation for our Compliance department, and your post lays out the steps very succinctly. I’ve been dealing with Microsoft’s solutions for a while now, but still find that them using the same language again and again in various products makes understanding the products more difficult. But this is perhaps the inevitable result of different teams working parallel on different aspects (retention or protection, for example).
Idea for a follow-up: Monitoring defined labels (or link to other posts that handle this)
Thanks for the feedback Russ! That’s a great idea for a follow-up post.
Thank you for writing this guide, it’s the clearest I’ve found while trying to create a custom sensitive information type. However, I’m running in to a roadblock. In my O365 Security and Compliance center, there isn’t an option for creating a new sensitve information type. Under Classifications, I have Labels, Label Policies, and Sensitive Information Types.
Selecting Sensitive Information Types lets me see all of Microsoft’s published types, but doesn’t let me create a new one.
All the resources I’m reading online say I have to build a custom XML file myself, then upload it through powershell. I’m learning the formatting requirements now, but I was really hoping I could use the GUI you show here, and just drop my Regex into a box.
I definitely used the GUI to do this. When I go into the Security & Compliance Center now, I no longer see the ‘Custom Sensitive Information Type’ on the menu, however, when I click the ‘Sensitive Information Types’ menu option, I have a Create button at the top of the page that will allow me to add a custom one.
Do you see that? If not, I would log a ticket with Microsoft.
Good Afternoon JK. I am probably way of track here, but i have a questions. I am fairly new to AIP and I am busy rolling this out to my Enterprise. I have DLP running smoothly in the background, but when i apply DNF (Do Not Forward), DLP will no longer scan my documents. Are you able to have both selected. I tried but as soon as I select protect on the Blade DNF is enabled but DLP wont scan. Any ideas?
Regards, Mark Simpson
Hi Mark, you selected ‘Protect’ to get to the DNF setting correct? I know currently encrypted documents can’t be searched which then also means DLP can’t scan them. This is likely why.
Hi Joanne! This thread is still very timely and a thought came from this from a client who is wondering if they can “build” on existing types. To do that, they feel like they need to see the original REGEX statements that Microsoft is probably using under the covers. The docs show what they are doing but in a less technical way. Do you know where we might find the REGEX pieces behind that? (I am suspecting that there is a place on the web for the various consortiums that have them but that would be spread out e.g. for GDPR stuff or PCI in the US etc.) Thanks in advance!
Hi! I don’t know where the regex would be (or if its hidden by design – not sure) but you can download the current xml definitions and that may help to some degree: https://docs.microsoft.com/en-us/microsoft-365/compliance/customize-a-built-in-sensitive-information-type