January 23, 2014
With LegalTech New York (#LTNY14) fast approaching, I find it a bit odd that to see some of same vendors at LegalTech as at ARMA and MER. With technology that, hmmm, looks pretty much the same at both. This raises some interesting questions about how eDiscovery tools may or may not address information governance (IG) objectives. Some use cases appear more plausible than others – for example, applying advanced analytics to the task of migrating a legacy information repository to enhance visibility into the contents of those repositories (e.g. what is duplicative, what is aged, transitory, etc.). But, attempting to point predictive eDiscovery tools at raw content sources in order to implement policies for information tracking and control is a bit more daunting – especially for those experiencing unrelenting data growth and explosion of content in unmanaged locations (as would be the case for most corporations today).
So, here are the top 5 reasons why eDiscovery tools may not be sufficient to address your short term information governance objectives (noting that capabilities evolve over time. M&A happens, product portfolios expand, OEM deals are forged, etc.):
- Volume: Most analytically driven eDiscovery tools have been well designed to plow through, analyze and accelerate review of clean, contextually specific data sets – let’s say a matter involving 20 custodians and 100 GB. But attempting to apply that same technology to plow through a billion items (as many corporations can easily accumulate) is more complex than just adding more processing power or spending additional time to train the system to produce a sufficient indexing rate. Data repositories tend to contain information that is highly duplicative, poorly indexed – and growing at a rate of 44x over the next several years per IDC. Analytically driven eDiscovery tools can enhance visibility (after being properly resourced with processing power and $$), but do little to address the high priority of gaining control over unchecked data growth
Context: eDiscovery tools operate best with a defined context of a matter or investigation, but there is no easily discernible context around the word ‘windows’ when pointing at an information repository. In fact, defining and separating the ‘high value’ from the ‘digital ROT’ within a typical IG initiative is often the product of input from legal, regulatory, IT, and business unit representatives melding their own definitions of information value and risk. IG is more than just improving eDiscovery efficiency and reducing expense by looking at upstream data patterns. And using analytics when context has been separated from content makes the technological challenge exponentially more challenging to produce measurable results.
Wild Data: Organizations today are struggling not only with the absolute growth of information, but also the fact that material information is increasingly being created (and is uniquely maintained) in unmanaged locations (e.g. social media, IM, networked fileshares, mobile, nomadic SharePoint sites, etc.). While it is true that eDiscovery today continues to be dominated by email, patterns of everyday business communications are changing dramatically as can be noted by actions from various regulatory entities including the SEC, FINRA, and FFIEC. eDiscovery tools work well in processing centrally stored data, but collecting and moving information from unmanaged locations is rarely practical or without risk. Technologies to enable management in-place are emerging, but few have yet achieved significant market presence.
Control: Many effective IG initiatives have focused not just on producing critical content when required, but understanding how information moves throughout its life cycle so that organizations can be proactive in managing information risks. Enhanced visibility from analytics tools is helpful to understand where the eDiscovery needles exist in the data haystack – but do little to understand how the pins, needles, and other sharp objects move within and across haystacks in order to determine how to best define policies and procedures to manage information risk and enhance control.
Cloud: It appears that much of the interest in the application of eDiscovery analytics to IG is due to failed enterprise content management implementations. Information life cycle management was a good idea, but ultimately failed because of poor user acceptance and on-premise technology design that became too expensive and complex to manage as data grew. Hence, the appeal of cloud-based information repositories that take advantage of shared resources and scale-on-demand benefits that are not attainable behind the firewall. To date, it does not appear that any leading eDiscovery analytics tool has been designed for the cloud (which is significantly different from simply offering a hosted version of the same on-premise technology through a service provider). Consequently, companies must deploy more servers requiring more storage and IT overhead – which appears to be a repeat of same failures of the 1990s. This will no doubt change – but evidence of leadership on this front is still scant.
eDiscovery and Information Governance will continue to become more tightly intertwined over time as more companies realize that the ‘keep everything forever’ strategy is not sustainable. Focus is beginning to shift from optimizing review efficiency to enhancing insight into data repositories so that value can be separated from junk earlier. But you should take care in ensuring that your short-term IG risk reduction goals can be delivered with the capabilities offered today by the eDiscovery tool providers.