Enterprise Content Management (ECM) is not a technology, but the processes and methodology that technology uses to create an effective way to store, secure and consume content.
This post is taken from my experiences as the SharePoint Technical lead on a document/record management ECM search-based solution built using on-premises SharePoint as the technology. Over 1 million documents were initially migrated into this SharePoint solution using a defined Information Architecture over a span of 4 days and has since grown to over 2 million documents over the past 3 years. This is not nearly enough to fall into Microsoft’s “large” records management definition, but enough for me to have learned a thing or two about what works well and what are some potential pain points.
In this post, I will highlight 3 of these points from a SharePoint Admin perspective based on my experience over the past few years. Although this particular solution was built on SharePoint 2010, I believe they still apply to SharePoint 2013 and 2016.
Watch your database growth
Applies to: SharePoint 2010, 2013, 2016
Sounds obvious, but you need to be able to accurately project the growth of your site collections and more importantly content databases (assuming you have 1 site collection defined per database). Based on the growth of your content how long will it take your database to grow to 200GB? If the content in your site collection meets certain requirements you can extend that limit to 4TB or in an archive scenario, no explicit content database limit at all. (Software boundaries and limits) You need to understand the requirements of each of these scenarios and where your particular solution fits into them in order to know the limit and how many site collections to create to keep within that.
You should be able to answer these questions about your content to help determine your site collection/content database layout:
- “How many documents will be migrated into SharePoint?” (If migrating in from a legacy location)
- “What is the average size of each document?”
- “How many new documents will be added each month?”
- “How long will the content be retained?” Often, there will be a risk and compliance area in your organization that can answer this question.
- “What makes a document a ‘record’?” Does it become a record automatically when added to SharePoint? Is there a business process that drives that action?
- “What is the nature of the content?” By this I mean will documents be updated once they’re in SharePoint or is it mostly an archive read-only scenario? Is there version history enabled on these documents? Will workflows be used against any of these documents? Will you allow item-level permission on documents (PLEASE NO!!!!)
- “How should documents be grouped across your site collections?” By high-level category? By organizational unit? By company?
Retention time combined with the volume and nature of your content are key factors in determining the total amount of space required in your content database. (The calculation to determine this is beyond the scope of this blog) You also need to understand how the back-end storage mechanism works for the platform you’re on (SharePoint 2010/2013/2016) to know how the database will grow as changes are made to content.
To illustrate this point, SharePoint 2010 and 2013/2016 handle storage differently for document changes. SharePoint 2010 stores an entire copy of the document for every change made to it, whereas SharePoint 2013/2016 introduced shredded storage which stores a document in small pieces (or shreds) in the database. Only the shred that was changed is updated. Due to this change, the storage footprint and file IO operations for document changes are reduced for SharePoint 2013/2016. Microsoft’s whitepaper on Shredded Storage for SharePoint 2013 is HERE.
Recap: Design your logical architecture (site collections/content database) based on your unique requirements and continue to monitor space utilization to ensure you don’t exceed supported limits. This is building a solid foundation for everything that follows.
Remote Blob Storage – are you SURE you want this?
Applies to: SharePoint 2010, 2013, 2016
Not long after the project started, the DBAs on our team heard about something called “Remote BLOB Storage” (RBS). They were concerned about the amount of content we would be storing in SQL and insisted we look into using RBS to offload the million or so BLOBs (binary large objects) about to be migrated into SharePoint.
What is a BLOB? This is the unstructured part of data. For example a Microsoft Office Word document would be a BLOB (unstructured) and its associated metadata would be the structured data. By default, both BLOBs and structured data are stored together in the content database, however the BLOB can be moved off onto commodity (possibly cheaper) non-SQL storage using a Remote BLOB Storage (RBS) provider.
Microsoft does provide a Filestream provider for RBS, however this is NOT recommended for a production environment. We trialed several different 3rd party RBS providers and selected Dell’s Storage Maximizer solution. Although it has worked extremely well for us over the past 3 years, I still have reservations about using RBS in general, regardless of Vendor.
Why? 4 reasons:
- Once you have RBS installed and configured, it is yet another product to patch and support in your SharePoint environment.
- Backup/restore of your SharePoint environment is more complicated as you now need to rely on coordination of both SQL and FileServer backups.
- You now have to be concerned if/how it will impact your farm’s upgradeability to the next version of SharePoint.
- You’re tightly coupling your SharePoint solution with RBS. What is the long-term support path for the RBS product?
I would only recommend using an RBS solution if the amount of content in your environment warrants the move onto non-SQL storage, if you have relatively skilled Server (File/Database/SharePoint) support teams, AND if you regularly test out your DR plan. You should be able to justify the reason why you now have a more complicated SharePoint environment.
As a SharePoint Administrator, I’m not sure this was worth the extra time and effort it took to incorporate into this particular solution. If you ask our DBAs they would say it definitely was. From my perspective, it complicated our conversion process and has definitely complicated our SharePoint environment. Also, the content database size limits documented by SharePoint do not exclude the BLOBs stored outside of SQL. This is a point of confusion for many who believe RBS is a strategy for “getting around” the supported limits. It’s not.
LATE ADDITION: As I was editing this post, I stumbled across a notification on Dell’s site stating they were discontinuing technical support for their Storage Maximizer product as of December 31, 2016. As it turns out, this is the 4th reason I identified above why it is a risk going with an RBS provider. We now have a significant piece of work to decouple our solution from this product.
Recap: If you choose to use RBS, go into this with both eyes wide open.
Define retention policies and act on them.
Applies to: SharePoint 2010, 2013, 2016
An important facet of any good governance strategy is defining when content has reached its end of life. When working within a large document management solution, having this strategy in place becomes critical. I strongly urge you to decide at the onset of your project when documents/records will be destroyed. In our case that decision was “pushed off” and we were forced to deal with a challenging destruction process 3 years down the road when a mass deletion of over 300,000 documents was required just to “catch up”. This of course introduced its own set of problems.
When you define an Information Management Policy for document retention, it is enforced by 2 Timer jobs to complete the work:
- Information Management Policy Job – determines which documents require a destruction action.
- Expiration Policy Job – performs the actual delete.
With my estimate of more than 300,000 records to be destroyed, many questions came to mind: What kind of performance impact would there be if those jobs ran for hours while end-users were using the system? Had anybody ever done this scale of a mass delete before? Should I stop the crawls while the timer jobs are running? Likely. Should I do a full crawl when it is complete? Likely.
Another very negative side effect of neglecting to regularly destroy content that is legitimately targeted for destruction is you run the risk of your content databases growing beyond supported limits. Remember the space calculations you painstakingly went thru when designing your logical architecture? Well, they won’t be accurate if you don’t actually destroy the content when you originally planned to.
The most detrimental side effect of not destroying content when you are legally required to do so is the compliance risk this may place on your organization.
Setting up a retention policy in your organization is a governance task. However, if not addressed it can soon become a technical risk for your SharePoint environment. I cannot stress enough how important it is to have Information Management Retention policies defined and being executed on in your environment.
Recap: Define a retention policy and execute it FROM DAY ONE.
Whether you’re building your on-premises ECM solution in SharePoint 2010, 2013 or 2016 these are a few of the things you should be thinking about. It’s imperative you know the capabilities and limitations of the particular SharePoint platform you’re working on as well as the compliance requirements of your organization in order to set yourself up for the best chance of success.
To address compliance risks facing many organizations today, new capabilities are being introduced into the on-premises versions of SharePoint (Compliance Center and In-Place Policy Hold Center in SharePoint 2016 for example). It is important to stay on top of these new capabilities in order to build the best solution possible for your organization.
Thanks for reading.