Proofpoint: Security, Compliance and the Cloud

7 posts categorized "Search"

June 17, 2008

SaaS delivers SLA-backed solutions;
Software delivers... just software

Posted by Craig Rennick, Founder and VP Sales

As a service provider of email archiving to enterprise customers, we pay great attention to customer service.  To this end, we talk to our customers frequently and schedule face-to-face senior level meetings to review our scorecard and get direct input on customer satisfaction. 

During our most recent road trip where I met some of our key customers, one of our customers summed it up perfectly when he said, “This is why we outsource to Fortiva, we want you stressing about this stuff – not us!”   Clearly we are upheld to scrutiny and standards that are beyond any one company’s capabilities to deliver in-house – and we expect that – I mean after all, why else outsource?   

In many of the calls I’ve been on, I continually hear about failed installations of on-premise software, like Enterprise Vault. What’s worse is that, I hear about them from prospects who have already invested time and money on software, built up a large archive, and are now having challenges keeping it running. And because change is difficult, IT administrators and users lower their expectations, apply band-aids and live with their situation. 

Then I think about Fortiva's customers and how they don’t need to tolerate unacceptable performance. They don’t need to accept less than desirable application effectiveness.  They also always have a provider working 7/24 to make things right and achieve industry leading SLA’s.  For example, Fortiva is the only provider in the email archiving industry that offers an SLA around search response time. To date, I have yet to come across an in-house IT department that would commit to any SLA’s for their own email archive.

No wonder we constantly hear extremely positive feedback from our customers.  Coming back from that week of customer visits, it reinforced why SaaS really is the wave of the future.  As a 15 year veteran VP Sales who now prescribes to SaaS for sales automation, I can see why IT will move in the same direction.

February 28, 2008

A Challenge to All Email Archiving Vendors (Part 6 in a Series on Search)

Search Posted by Chris Tebo, CTO

Over the past few weeks, Rick and I have spent a good deal of time discussing the benefits of real-time search as well as the challenges of delivering it. To end our "Series on Search", the most important point we want to make is a very simple one .an email archive must offer consistent, timely search results to be effective. Customers should demand this and vendors, be it in-house or SaaS, should make hard commitments to deliver it.

Whether it's being used by the legal department to execute pre-discovery or discovery searches, or end-users to search for their own email, search performance expectations must be met and maintained. At Fortiva we take this very seriously.  We commit to a search performance SLA in our contracts with customers.  We monitor our infrastructure every day to ensure that we're living up to this guarantee, and we deploy infrastructure as needed to ensure we continue to live up to it.

So, to all the email archiving vendors out there - I challenge you to do the same, and provide a search time guarantee.  For managed and SaaS vendors, this is a pretty straight-forward challenge. In-house vendors, on the other hand, will be hard-pressed to do so. Why?  Because it is as much about the hardware the solution is deployed on as it is the software, and the software vendors don't have the ability to ensure that the hardware is appropriate. But they can provide open and clear guidelines on how much hardware is required to maintain search performance, particularly as the archive grows.

To those of you who are considering an email archive, or are creating an archiving RFP, make sure you should have clear expectations for search response times before you begin. Ask potential vendors what they will commit to, and hold them to it. If they won't make a commitment, you probably want to ask yourself why not.

January 25, 2008

The e-Discovery Search Quandary – Justifying the Cost of Infrequent Searches (Part 5 in a Series of Search)

Search Posted by Rick Dales, VP Product Management

In our previous posts, both Chris and I discussed the significant investment in infrastructure that is necessary to provide fast, reliable search of corporate email. Even just a few years ago, this wasn’t a big issue for most businesses because they simply weren’t conducting searches across the entire email repository. However; in our increasingly litigious society, the growing costs that come from e-discovery are forcing more and more businesses to address the notion of "litigation readiness" – which inherently requires the ability to search email to isolate materials relevant to a given case. 

For companies that live under the cloud of a perpetual cycle of lawsuits, a variety of new technologies and processes have emerged to help people manage, collect, review and produce information for litigation.  Unfortunately, these approaches are often very expensive and can't be justified by the majority of businesses that only periodically face litigation hold and/or e-discovery activities -  a point that was reinforced by a recent survey that showed 1 in 5 businesses have settled a case to avoid the cost of searching through and retrieving email. 

For a company with a relatively long standard retention period (something that is becoming the norm), legal must be able to mine through a constantly-growing set of emails. This is particularly problematic because the cost to provide relatively quick searches doesn't grow linearly with the data growth, but instead, in most systems it grows exponentially. As difficult as it often is to justify the costs of "preventative" technologies (such as email archiving for litigation readiness), a system with rapidly increasing costs is even harder to justify.

Software-as-a-Service (SaaS) is a perfect model for addressing these types of challenges. Here’s why. When an e-discovery request comes in, most companies need powerful e-discovery capabilities with very little advanced notice; however, the rest of the time, they’re unlikely to need that search capability. Instead of building a system in-house that is underpowered when it's needed and wasteful the rest of the time, SaaS allows firms to readily access a pool of resources on-demand to meet their needs.

By spreading the cost of a large infrastructure over many customers, each of whom are unlikely to need the system at the same point in time, users get maximum capabilities at a far more justifiable, predictable cost.  To scale without bounds, SaaS companies like Fortiva are forced to build infrastructures whose cost does not grow exponentially (or it would be less and less profitable to take on new business).  This technology investment gets further passed along to the customer base so that costs per unit of data stored/processed go down over time.

Just like buying insurance, litigation readiness is about reducing risk and preventing significant, unexpected (and unplanned) costs.  There is the cost of enforcing a litigation hold; the cost of e-discovery activities and the cost of increased litigation risk by not having (or having access to) critical data – not to mention the costs of negative judgments. So it’s not surprising that litigation readiness – much like insurance again – can be a challenging thing to justify, especially when lawsuits aren't part of your firm's daily life. SaaS solutions can prove to be the best way to balance these needs.

Click here to read Part 6 in the Series on Search

January 15, 2008

What is Google hiding - Just how many servers do they have, anyway? (Part 4 in a Series on Search)

Posted by Chris Tebo, CTO

Search In my previous post, I started looking at what it takes to deliver real-time search for an email archive.  In that post I was focused on looking at the size of the dataset that an email archive encompasses, indicating that these datasets start approaching the size of Google's web-indexes.  At those scales,  the hardware infrastructure deployed to support real-time search are just as important as the software that executes those searches.

Four years ago I started down the path of building what has become the Fortiva Archiving Suite.  Providing real-time search for a corporate email archive was obviously a requirement.  To get started down this path, I called a former colleague who had spent his time in the 80s and 90s leading the charge building market-leading search and knowledge management solutions.  He's kept up with the comings and goings in the search technology space since then, and my question for him was simple - whose search technology should we license to solve our email archive search needs...

His answer surprised me and his guidance was straightforward...

  1. All of the real-time search technology out there today is based on the concept of an inverted-index which allows you to very quickly identify documents that match search criteria.  From one vendor to the next there are  differences in their implementations, but at the core, the technology remains the same.
  2. The challenge with large search datasets is that to deliver real-time search, the search infrastructure needs to be able to execute searches in parallel over multiple servers.  At the time, none of the search vendors were providing this as part of their solution.

As we were setting out to build our SaaS email archiving solution,  developing a platform and infrastructure that allows one to distribute work over multiple servers (be it for search or any other function in the application) was fundamental.  So what we did is the obvious thing... We combined the SaaS platform we needed to develop anyway, with appropriate hardware, and open source search technology that provides the inverted-indexes that are at the core of search.  We built #2,  and we leveraged others work on #1.

This may all sound like talk about the plumbing, and it is...  But that was the point my colleague was making to me.  Delivering real time search on these large datasets is just as much about the plumbing as it is the search technology.  Take Google for example... They'll let you run their search application on your desktop.  That represents point 1 above.  But if you try to find out how many servers Google has deployed to support web-search, and what that infrastructure looks like, you are unlikely to have much luck. Google remains very secretive about that, for good reason.

We've made mention of one firm we've spoken to whose inhouse email archiving solution has taken over 25 days to complete a search.  They, like many firms who tackle the challenge of enterprise search on large datasets have learned the hard way that the software deployed to address this problem is only a small part of the puzzle.  Deploying and managing the infrastructure required to deliver on the promise of real-time enterprise search is the hard part.  For many firms, the cost of care and feeding for this infrastructure is excessive, and just doesn't make sense.

At Fortiva, we have deep knowledge about the infrastructure required to deliver real-time search to our customers.  We know how much index data any one of our servers can manage, and we work hard to improve our software and our processes to deliver better results to our customers.  I won't share those numbers here, for the same reason that Google doesn't tell you what their infrastructure looks like.  Instead, and more importantly to our customers,  we deliver a guarantee on search performance to our customers.   That's what they really care about.  More on this next time...

Click here to read Part 5 in the Series on Search

January 09, 2008

How many servers do you need to deliver real time search? (Part 3 in a Series on Search)

SearchPosted by Chris Tebo, CTO

Rick's previous two posts have introduced the real-world challenges that organisations face when trying to implement enterprise search, and specifically when searching through the full content of an company’s email archive.  This topic seems quite timely with the news breaking this week of Microsoft's offer to acquire Fast Search and Transfer for $1.2 billion.

I'd like to share some of my experience that comes from building an email archiving solution that provides real-time search with performance guarantees to our customers.  In discussing the challenges of enterprise search to others, I've often heard responses like "Why would that be so hard? I use Google/MSN/Yahoo Desktop search and it responds in real-time?"  We're also accustomed to real-time search response when using our favourite web-search engines.

In order to deliver real-time search you need to provide two key elements: 1) software that can prepare search indexes and execute searches against those indexes, and 2) enough hardware to run the software.  In the case of desktop search, the data-volumes are actually quite small.  If an individual user is receiving 1GB of mail per year and is also storing an additional GB of documents locally, the index representing several years of that data can be searched quite effectively by the computing horsepower provided by a relatively recent laptop or desktop.

But imagine what happens when the dataset being exposed by an enterprise search application contains the data for 1,000s of your corporate users.  Robert Smallwood crunched some numbers and came up with this:

'But archiving all e-mails for seven years for an organization of 25,000 employees adds up to 4.5 billion documents. Just as a reference, as of a year ago (2004), Google handled only 4.3 billion documents."

So maybe your organization isn't that large or doesn't require seven years of retention.  If you're company has 5,000 employees with a 3 year retention policy, you're going to have an archive of close to 500 million e-mails at the end of three years.  That's still 1/10th the size of Google's 4.3 billion documents in 2004.

I don't think anyone would suggest that their laptop running desktop search could address search needs against a corporate dataset consisting of 500 million documents.  So what's the difference between that Google Desktop search application that you're running on your desktop and the Google Web Search infrastructure that can handle real-time search against billions of documents?  Sure there are bound to be differences in the software itself, but the biggest factor in all of this is the scale of infrastructure that is deployed to deliver real-time search.  By scale, I mean, lots and lots of servers.  Sure there's tons of smarts built in to these solutions, but to deliver response times measured in seconds the "work" of executing a search needs to be distributed to a large number of servers in real-time.

So when considering an on premises email archive, or any other enterprise search application, the key question to answer is - how many servers will be needed to deliver real time search?  The sad reality is that for most on-premise email archives, that question doesn't get answered until it is far too late.  The question gets raised when users start experiencing searches taking 24 hours, or even 24 days.  And the unfortunate reality is that for many organizations, it will be challenging to justify the hardware and management expense of deploying the hardware needed to address this problem.  Remember that the dataset grows month by month.... That means deploying new hardware every month.

In my next post I'll describe in more detail how real-time search can be delivered with sufficient hardware, and also why delivering real-time search through a SaaS solution is far more cost-effective than trying to build your own Google in-house.

Click here to read Part 4 in the Series on Search

January 04, 2008

Searching an Email Archive: Real-World Examples (Part 2 in a Series on Search)

Posted by Rick Dales, VP Product Management

Search In my previous post, I talked about the significant challenges of enterprise-wide search, and how those challenges directly translate to an email archive (in fact, they’re arguably greater for an email archive).

Today, most organizations archive email for legal discovery purposes. While they may have other goals, including compliance and storage management, searching through the entire repository is a fundamental requirement for any archive. The problem is that firms always underestimate the growth of the data and the infrastructure required to support the searching of that data (and the sales team from most email archiving vendors have little or no reason to change that).

To further this point, I wanted to share some real world experiences from companies we recently talked to that have an in-house email archiving solution in place. The first is an international bank that was archiving for a division of about 10,000 users.  Within two years they had amassed several terabytes of information. At that point, every time their legal or compliance department requested data from the archive, it was taking the IT department in excess of 24 hours to run a search.  With an expectation of next day delivery of information, this left no room for error.

This is far from the worst example we’ve seen. Another firm took over 25 days to complete a single search. And these experiences are not uncommon.  Making it even worse, we frequently hear that IT staff must stay up all night monitoring these long-running activities, because turnaround times don't allow for processes that fail overnight to be restarted the next day. 

Almost without exception, the companies we talk to say that their email volume is growing faster than expected.  The end result is that any new investments in the archive go toward growing the data intake processing capacity, not the search or access capability. Companies simply don’t have the budget, staff or time to keep up with search optimization. Which takes me back to my first post of this series, where I explained how a few years’ worth of corporate information can quickly accumulate to the size of all public information on the web, making it unreasonable for a company to even try to achieve short windows for search in-house (it would require hundreds or thousands of dedicated servers).

The big challenge is that for most organizations archiving data for litigation readiness, the data remains largely untouched until a legal issue arises. At that point, critical (and time-sensitive) searches are required. Yet maintaining the infrastructure in-house to conduct those searches on an infrequent basis (even a couple times a week) makes no sense. Leveraging a shared (SaaS) infrastructure for search, on the other hand, is an ideal way to cost-effectively conduct time-sensitive searches on a periodic basis.

As the archiving industry begins to mature, and more companies have experience managing an archive for more than a year or so, this problem will continue to come to light, and the benefits of multi-tenancy for archiving will be better understood. In the meantime, if you’re considering an email archive, take the time to ask the vendors you’re evaluating if they track search performance. Furthermore, ask to speak to customer references that have been archiving email for a significant period of time (and that have a comparable storage requirements to your own), and ask them about search times. You might be surprised at the answer.

Click here to read Part 3 in the Series on Search

December 19, 2007

Why is enterprise search so elusive? (Part 1 in a Series on Search)

Posted by Rick Dales, VP Product Management

Search Time and time again, we get calls from people who are looking for a new email archiving vendor because they are frustrated with the search performance of their current archiving solution. And it’s not surprising that they’re frustrated. With Google searching the whole internet in real-time, it seems logical that searching data across a single company would be a fairly easy thing to do.

As far back as the early 80's (when desktop document tools entered the workplace) people have talked about the importance of enterprise search as a key enabler for knowledge workers. Twenty five years later, an abundance of "enterprise search" products exist, yet very few firms have implemented company-wide searching of business information. Which leaves the obvious question - why not?

In fact, "enterprise search" is really a misnomer for these products which invariably are focused on narrow areas of information such as intranets or specific application data. The truth is that providing real-time searching of data across the enterprise (especially when you include the vast amount of email) involves significant challenges.

As the only company in our industry that offers a search time guarantee, Fortiva has a first-hand understanding of these challenges. Here is a quick breakdown of why searching through enterprise data is harder than it sounds:

Finding and quickly indexing distributed information is challenging
Business information comes in many forms -- structured and unstructured on an ever-changing set of machines on the corporate network.  Finding the machines that contain this information and scanning each file on each machine is both costly and challenging, particularly when dealing with laptops that come and go from the network.  Once documents have been found, the processing cost of extracting textual content in a meaningful form is also difficult and expensive.  Given that people most frequently look for recent information, if the indexing process doesn't work very quickly, users will likely find that the search engine is useless.

For day to day use, search needs to be fast -- which can be extraordinarily expensive and require an enormous amount of infrastructure
Web search engines such as Google, MSN and Yahoo have set user expectations for search.  If you have to wait more than a few seconds to get search results, you give up and move onto the next search engine. These sites get performance by distributing search activities across hundreds (or thousands) of servers for each search request, then aggregating the results.  Since the web is a single corpus of information that everyone shares common access to, and the search engines can profit from searches through advertising dollars, building out this large scale infrastructure is cost-effective.

What a lot of people don’t realize is that a few years’ worth of corporate information can quickly accumulate to the size of all public information on the web.  Yet to support the same search performance, implementing hundreds (or thousands) of servers in a single organization can never be justified.

Information access tools need to understand changing security models
Most documents within an organization are designed for consumption by a very limited audience.  Confidentiality of many types of information, including financial, business development and HR content is critical within an organization.  To provide a unified search infrastructure, each user must only be able to see search results for documents that they should have access to.  Determining these security relationships is challenging, however, because most firms assume that documents that live on a user's machine or their personal space on the network should only be accessible to that user.  Making this assumption, however, dramatically reduces the knowledge management value of enterprise wide search.

Ultimately, when you consider these challenges, it’s not surprising that so many people are frustrated with the search performance provided by in-house email archiving solutions. An email archive has to deal with a massive amount of searchable information, often many times the size of the "active" information set found throughout the corporate network.  Providing high-performance search across this data requires distributed search technology similar to that used by the web search engines and a large infrastructure. The truth is that for most companies, keeping up with the infrastructure requirements to support real-time search of the ever-growing volume of email data is – and will continue to be – cost-prohibitive.

And that’s where Fortiva’s software-as-a-service model starts to make real sense. By sharing the costs of the search infrastructure among different customers, Fortiva is able to guarantee search performance on an ongoing basis. Chris Tebo, our CTO and I will get into more detail about how we do this a future post in this search series.

Click here to read Part 2 in the Series on Search

Archives

Blog Search

Email Security Gateways, 2011

Magic Quadrant

Tweets

What people are saying right now about us.

©2012 Proofpoint, Inc.
threat protection: Proofpoint Enterprise Protection compliance: Proofpoint Enterprise Privacy governance: Proofpoint Enterprise Archive secure communication: Proofpoint Encryption