Purview Retention and the Microsoft Copilot Blueprint

Reading Time: 7 minutes

Disclaimer: This post was written by me, not AI. #OriginalThought 🙂

At Microsoft Ignite (November 2024), Microsoft introduced their deployment blueprint for Microsoft 365 Copilot with high-level guidance for both E3 customers and E5 customers.

It is time well spent reviewing this staged approach to enabling Copilot in your organization as it includes some things you can start doing immediately to reduce the oversharing concerns you may have and, in fact, are all good things to be doing anyway!  The blueprint provides a framework and starting point to build your own plan with timelines and prerequisite activities for implementing Copilot in your own organization, but please know that the outcome of all the work that needs to be done to get you there is not only Copilot readiness, but good old (boring) data governance.

Data governance. Easy to say, harder to implement.

My consulting engagements focus exclusively on the Microsoft Purview solution suite and how they can be used to reduce data governance (and data security) concerns across all Microsoft 365 workloads. What I’ve learned along the way is good data governance principles and practices are paramount to success.


Where does retention fit into Microsoft’s blueprint?

There are many features in Microsoft Purview, SharePoint Administration, and SharePoint Advanced Management that are part of the blueprint guidance from Microsoft. The Purview retention controls are part of the blueprint’s Operate phase to improve Copilot responses: (yellow stars)

Why is retention part of the Operate phase and not any sooner?

I didn’t develop the blueprint, so this is conjecture on my part, but I’m guessing the retention controls aren’t introduced sooner to Improve Copilot responses because many of the activities listed in the Pilot and Deploy stages can be done much quicker and, in many cases, with less decision-making than a full-blown data retention (or data protection) project typically takes.

Rest assured, any efforts you’ve already made towards data retention (and protection) are valuable and will expedite your progress through Microsoft’s blueprint.


The guidance refers to two Purview retention controls for improving Copilot responses – retention policies and retention labels.

What is a retention policy? It is a retention control that can be configured to retain, retain and delete, or just delete after a period. Retention policies work at a “container” level and apply the  retention/deletion settings defined in the policy to everything in the container.

How can retention policies improve Copilot responses? Retention policies can do this by deleting stale and redundant, obsolete, and trivial (ROT) information. You may be asking yourself “How will we know if information is stale or ROT?” Well… you have to make a decision on what that means for your organization… is content stale if it hasn’t been modified in a year? in 2 years? in 3 years? Is content stale if it was created more than 5 years ago? (Unfortunately, you can’t use “last accessed” or “last viewed” as a retention trigger which would be very useful for both retention and Copilot. The only options are the last modified date or the created date)

Sidebar: Check out the Site Lifecycle management inactive site policy that is part of the SharePoint Advanced management feature. That feature DOES look at the “last viewed” activity on a file to determine if a site is active or inactive.

If you’re worried about making a decision about when files should be deleted with a retention policy configured to delete content, there is a backstop retention control you must also have in place to ensure you are retaining content you may want to keep longer, even if it meets one of the retention policy conditions above… that control is a retention label.

What is a retention label? It is an item-level retention control applied at a granular level (email, file, item). They are applied to content that must be retained for historical, regulatory, legal, or business reasons.

How can retention labels improve Copilot responses? If you have content you need to retain for longer than the retention policy deletion period, such as financial or contract documents, a retention label configured to retain for longer and applied to those documents would ensure they wouldn’t be deleted by the retention policy. Instead, they would be retained for the longer period defined in the retention label.

What’s the takeaway here? It’s the combination of BOTH Purview retention policies and retention labels that will be required for most organizations to ensure that stale and ROT content is removed while enforcing all records of value to be retained.  The result? Better information going into Copilot which will improve Copilot responses.


Ideas for Retention Policies and Retention Labels (to help Copilot)

At the risk of oversimplifying a full-blown retention program, here are some pragmatic ideas for (quickly) introducing retention controls in your organization. Will these work for all organizations? No, but it’s worth the time to assess each idea to know if it would work for yours:

Get business content out of OneDrive

Yep. I said it. Ideally, you should be reducing the amount of business content stored in OneDrive. This is for several key reasons:

  • Leavers: when a user leaves, unless you have a well-executed and consistent process for reviewing their content so any gold found there can be moved to a shared location (SharePoint), you either end up over-retaining the content (forever?) or just deleting everything. Neither is good and does nothing for the quality of Copilot responses. (also, see the potential Microsoft Archive implications below)
  • Visibility: unless OneDrive business content has been shared with others, no one else in the organization can see it/use it. This does nothing for the quality of Copilot responses.
  • Data risk: more data always means more risk in the event of a compromised, malicious, or negligent user

High-level advice for reducing the amount of OneDrive content:

  1. Provide firm “where to store what” guidance to business users. This will, over time, reduce the amount of business content that may be stored in OneDrive in the first place. Guidance alone is not enough however. You must give it “teeth” by also enforcing a retention policy to automatically delete after a period… the next point.
  2. Apply a retention policy configured to automatically delete OneDrive content after a (relatively short?) period of time. It is up to you if you also want to retain the content during that period (see the message below about the Microsoft Archive implication for that), but the important part is deleting the content automatically. You should provide sufficient advance warning to users before enabling a retention policy like this. Users will soon learn that business content needs to be moved to a shared, more authoritative location such as SharePoint or else it will be deleted.
  3. Think very carefully before publishing retention labels to users’ OneDrives to allow them to apply labels to their important content. Although applying a retention label (set to retain) would certainly prevent their content from being automatically deleted with a retention policy, it comes at the cost of their OneDrive being archived through Microsoft Archive if their account is ever deleted. (See Microsoft Archive below)
    • A pragmatic alternative to publishing retention labels to OneDrive locations is to move any content requiring longer retention from users’ OneDrive sites to a shared collaboration space (SharePoint site) where targeted retention labels can be enforced.

Microsoft Archive. If a retention policy (set to retain), retention labels applied to items (set to retain) or an eDiscovery hold is in effect on a user’s OneDrive and either their license is removed or user account is deleted (Leaver scenario), the OneDrive site will be automatically archived after 93 days through Microsoft Archive. This will have a cost implication.

Microsoft link: Manage unlicensed OneDrive user accounts


Focus on the Crown Jewels in SharePoint

Applying retention policies and labels to SharePoint sites requires more strategic and holistic planning than OneDrive due to the wide breadth of content that may be stored across all the sites coupled with the complexity of an organization’s retention schedule. A significant amount of time can be spent on this.

Unless your retention schedule is very small and straight forward (yes, I’ve seen a few), you are likely going to have to pick select areas of your schedule to start with. If you don’t have the luxury of time to map out your entire retention schedule to SharePoint before starting to apply labels, consider starting with the crown jewels in the schedule instead. It’s a pragmatic approach.

Examples: contracts, board records, financial reports, budgets, customer files, vendor files, etc.

This type of content is usually more important than regular content to an organization as it often holds legal, regulatory, historical or business value. Look for where you may have these types of crown jewel business records stored across sites in your tenant and focus there. Odds are, some of them will likely be found in the exact same sites you have identified for Restricted Content Discoverability (RCD) or Restricted SharePoint Search (RSS), both optional, new features Microsoft has introduced for Copilot readiness.

How can retention controls help with RCD? RCD requires a list of sites to be excluded from results in org-wide searches and Microsoft 365 Copilot (good use-case is high-risk sites). You may think it is less important to have retention controls applied to sites like this because they are excluded; however, by their very nature, they likely contain content that is very important to an organization and typically this type of content should have not only security and protection controls on it, but also retention controls. These sites are a great place to focus your retention efforts. Ideally, once all the appropriate controls are in place for sites part of RCD (security, protection retention), they can be excluded from this list.

How can retention controls help with RSS? RSS requires a list of sites to be included in org-wide searches and Copilot experiences. It is considered an optional and temporary measure to provide organizations time to review and audit site permissions on all other sites in the tenant. If you have enabled RSS in your tenant by identifying (up to 100) sites to be included, it is also important to ensure you have the appropriate retention labels applied to any business records found there (crown jewels will always have retention labels and not retention policies applied). Once you’ve applied retention labels to content on those sites, you should feel comfortable also applying a retention policy to delete any unlabeled (stale) content from those same sites. This will improve the quality of information Copilot has to work with.


Closing thoughts

Microsoft’s blueprint bullet, “Improve Copilot responses,” has a lot of detail behind it. It extends far beyond “apply retention policies and labels”. I hope this post has sparked some ideas and highlighted areas where you can channel your efforts in data lifecycle and records management.

Ultimately, it’s about enhancing your data governance and elevating the quality of your data.

Thanks for reading.

-JCK

One comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.