Blog post: 4 minute read
Office 365 comes with 87 built-in sensitive information types. They can be used to identify and take action in several places: Data Loss Prevention (DLP), Azure Information Protection (AIP) Labels, and Retention Classification Labels. Examples of sensitive information types are:
- Canada Bank Account Number
- Australia Driver’s License Number
- Credit Card Number
- U.S. Social Security Number (SSN)
You can also create your own custom sensitive information types to detect organization-specific content for security and compliance reasons. This post will walk thru an example of creating a custom type for an internally-formatted customer number and then using it in a DLP policy and a Retention label and a modified technique for detecting it in an AIP label. (see note below)
Note: Currently, AIP doesn’t support custom sensitive information types. You can use either the built-in sensitive information types or create a custom Regular Expression in the Azure Portal. Supporting custom sensitive information types is on the backlog and will come as part of the AIP and Security & Compliance Center Retention labels.
The customer number in this example is in the format AA-##### where AA is any uppercase alphabetic character indicating the customer type and ##### is the 5-digit customer number.
Examples: EF-123545, JK-97812, DG-93809
Build the Regular Expression
We’ll use a regular expression to detect the exact customer # format. If you’re new to regular expressions, they can be a bit intimidating, however I’ve discovered a great website to help with not only building expressions, but also validating them. Check it out: https://regex101.com.
The regular expression to detect a valid customer # in this example is this:
Note: In real-world, the first 2 characters of the customer number may be limited to specific combinations of letters; if this was the case, that would need to be built into the regular expression.
Build the Custom Sensitive Information Type
This is done in the Office 365 Security & Compliance Center within the Classifications section by selecting the Custom sensitive information types menu option. When you create one, provide it a descriptive name and description:
On the definition details, I’ll select a confidence level of 80 on the pattern and then I’ll select Regular expression to define the primary element:
Two pattern elements will be added: the regular expression as the primary element, and the keyword “Customer” as the secondary element. This means if the keyword “Customer” is detected within 300 characters of where there was a match on the regular expression, it will be a positive match. The keyword is optional.
Test the Sensitive Information Type
When you save the custom type, you’ll be prompted to test it. Great idea!! Testing your pattern is done using text files – create a series of text files seeded with customer numbers to test your pattern elements. Make sure you also test ones that don’t have any customer #s in them!
Here are the sample documents I uploaded and my expectation for each one written as part of the text and shown with a check mark or x:
Here are the results for all 6 of the test files. Each one passed/failed as expected:
Once you’re satisfied with the test case results, you can now use it to auto-apply controls using different features:
- Retention label
- DLP policy
- AIP label (with regular expression)
Let’s do it!
Create an Auto-applied Retention Label
I’ve created a new classification label called Customer Info to be retained for 2 years before being deleted. I select the auto-apply button and the custom sensitive information type created in the first part of this post to automatically detect content that matches.
Note: it can take up to 7 days for the label to be auto-applied. Mine took 2 days.
To test out the auto-apply feature, I’ve created a Modern Team site with 6 documents in it, each one with the same content as was tested in the text files on the first part of this post. This will make it easy to verify if the auto-apply feature is correctly applying the retention label since I will expect documents 1 and 5 to have the label auto-applied.
Below is the “before and after” views of the document library showing which documents had the retention label applied: Sample document #1 and #5! Awesome!
Create a DLP Policy to auto-detect a Customer #
I created a DLP policy to detect any content matching the custom sensitive information type defined earlier in the post. Similar to the retention label, I selected the custom type and then set the policy to detect when it was shared external to the organization when at least 1 of them was found:
Let’s browse to the same test document library created for the Retention label test and try sharing 2 documents, 1 with a customer # and 1 without. As you can see in the image, Sample document 1 detects the sensitive customer information in the document and prevents it from being shared outside the organization. Sample document 2, on the other hand, can be freely shared.
Azure Information Protection Label
Even though the capability to use the custom sensitive information type is not currently supported in an AIP label condition, we can still use the same regular expression to either automatically set or recommend a label. For consistency sake, let’s do that!
I created a new AIP label called Customer in the Global AIP policy. I added some visual markings (header, watermark), applied protection so it couldn’t be copied or printed and added a condition called AIP Corp Customer Number using the same regular expression as in the custom sensitive information type. If the condition was met, I recommend the Customer label be applied.
To test, I created a Word document and entered a customer number. Here’s a quick video to show how the AIP engine auto-detects the customer # and then recommends the Customer label based on the condition we set:
This is good functionality to detect any custom information you need to apply security and protection controls on. The biggest challenge is to define the correct custom patterns to be able to accurately auto-detect the content that needs it.
I’m anticipating the unification of AIP and Classification labels where we’ll be able to also leverage custom sensitive information types with AIP controls as well.
Thanks for reading.