Posted by Rick Dales, VP Product Management
Time and time again, we get calls from people who are looking for a new email archiving vendor because they are frustrated with the search performance of their current archiving solution. And it’s not surprising that they’re frustrated. With Google searching the whole internet in real-time, it seems logical that searching data across a single company would be a fairly easy thing to do.
As far back as the early 80's (when desktop document tools entered the workplace) people have talked about the importance of enterprise search as a key enabler for knowledge workers. Twenty five years later, an abundance of "enterprise search" products exist, yet very few firms have implemented company-wide searching of business information. Which leaves the obvious question - why not?
In fact, "enterprise search" is really a misnomer for these products which invariably are focused on narrow areas of information such as intranets or specific application data. The truth is that providing real-time searching of data across the enterprise (especially when you include the vast amount of email) involves significant challenges.
As the only company in our industry that offers a search time guarantee, Fortiva has a first-hand understanding of these challenges. Here is a quick breakdown of why searching through enterprise data is harder than it sounds:
Finding and quickly indexing distributed information is challenging
Business information comes in many forms -- structured and unstructured on an ever-changing set of machines on the corporate network. Finding the machines that contain this information and scanning each file on each machine is both costly and challenging, particularly when dealing with laptops that come and go from the network. Once documents have been found, the processing cost of extracting textual content in a meaningful form is also difficult and expensive. Given that people most frequently look for recent information, if the indexing process doesn't work very quickly, users will likely find that the search engine is useless.
For day to day use, search needs to be fast -- which can be extraordinarily expensive and require an enormous amount of infrastructure
Web search engines such as Google, MSN and Yahoo have set user expectations for search. If you have to wait more than a few seconds to get search results, you give up and move onto the next search engine. These sites get performance by distributing search activities across hundreds (or thousands) of servers for each search request, then aggregating the results. Since the web is a single corpus of information that everyone shares common access to, and the search engines can profit from searches through advertising dollars, building out this large scale infrastructure is cost-effective.
What a lot of people don’t realize is that a few years’ worth of corporate information can quickly accumulate to the size of all public information on the web. Yet to support the same search performance, implementing hundreds (or thousands) of servers in a single organization can never be justified.
Information access tools need to understand changing security models
Most documents within an organization are designed for consumption by a very limited audience. Confidentiality of many types of information, including financial, business development and HR content is critical within an organization. To provide a unified search infrastructure, each user must only be able to see search results for documents that they should have access to. Determining these security relationships is challenging, however, because most firms assume that documents that live on a user's machine or their personal space on the network should only be accessible to that user. Making this assumption, however, dramatically reduces the knowledge management value of enterprise wide search.
Ultimately, when you consider these challenges, it’s not surprising that so many people are frustrated with the search performance provided by in-house email archiving solutions. An email archive has to deal with a massive amount of searchable information, often many times the size of the "active" information set found throughout the corporate network. Providing high-performance search across this data requires distributed search technology similar to that used by the web search engines and a large infrastructure. The truth is that for most companies, keeping up with the infrastructure requirements to support real-time search of the ever-growing volume of email data is – and will continue to be – cost-prohibitive.
And that’s where Fortiva’s software-as-a-service model starts to make real sense. By sharing the costs of the search infrastructure among different customers, Fortiva is able to guarantee search performance on an ongoing basis. Chris Tebo, our CTO and I will get into more detail about how we do this a future post in this search series.
Click here to read Part 2 in the Series on Search