In Search of Valid Data Searches

Posted 01-15-2013

In Search of Valid Data Searches

By Thomas Barnett


“In too many cases . . . the way lawyers choose keywords is the equivalent of the child’s game of ‘Go Fish’ . . .”  

                                                       -Judge Andrew Peck


If you think searching for data in litigation has been a difficult and risky fishing expedition to date, just wait. The current tidal wave of data along with heightened scrutiny is making it even harder, and your company could find itself on the hook for serious penalties.

There’s a revolution occurring. A new world order is on its way. The old rules don’t apply and the familiar ways of doing things are becoming obsolete. Survival depends on adapting to the new challenges and understanding the tools necessary to survive. The revolution is called big data and the people responsible for managing the information and technological infrastructures of business are on the front lines. 

The primary tool for survival is the ability to search for and find the data you need, when you need it, and have the confidence that your results are documented, accurate and defensible—in other words, that you’re performing valid searches. This applies to every type of query, whether for security, business operations, regulatory compliance or litigation. If you don’t know and can’t show how well your searches worked, you can’t have confidence in the data you gathered and the accuracy of the conclusions based on analyzing the data. Needless to say, a lot is at stake in getting it right.

But unlike most technological efforts in a company, when it comes to searching for data for a litigation demand or regulatory enforcement, subpoena lawyers run the show. And as various cases have shown, no matter how good the lawyers are, most lack the necessary expertise to create effective protocols for searching data and validating results. Nor do the individuals (i.e., data custodians) in possession of relevant data have the required know-how to defensibly search for and identify it.


“[M]ost custodians cannot be ‘trusted’ to run effective searches because designing legally sufficient electronic searches in the discovery … context[s] is not part of their daily responsibilities.”
-Judge Shira Scheindlin

National Day Laborer Organizing Network v. United States Immigration and Customs Enforcement Agency is the most recent significant legal opinion on this issue (and the body of these cases is growing). It was authored by U.S. federal judge Shira Scheindlin who has written a number of frequently cited opinions on the adequacy of e-discovery efforts, including the most well-known e-discovery case, Zubulake v. UBS Warburg. In the Day Laborer case, Scheindlin directly addressed the issue of the adequacy of searches for data and highlighted the need for being able to validate, explain and defend how data was identified and produced in response to a litigation discovery demand.

In Search of Valid Data Searches

 What’s a trillion or two gigabytes among friends?

The concern about accurate and valid searches is not new. For years the courts have been criticizing how searches are handled. What has changed is the volume, variety and complexity of the data that needs to be searched, as well as the knowledge and expertise required to search in a way that can be explained and defended. Even professionals who manage data for businesses are struggling with finding what they need when they need it and making sure it’s done right. 

The exponential rise in the amount of data and the corresponding increase in the percentage of unstructured or semi-structured data, as opposed to neat columns and rows of structured data, have led to innovation in how data is processed, stored, organized and searched. The tools and processes for searching structured data, such as a traditional relational database, have proven inadequate to the volumes and lack of structure of much of the tidal wave of information that we are experiencing. 

Nowhere in the realm of data management are the stakes higher and more starkly apparent than in responding to litigation discovery and regulatory enforcement demands for data. As the challenges posed by the volumes and unruly nature of the data mount, the quality of data searches is coming under increasing scrutiny. The well-performed search is the crucial first step in reducing to massive data sets to manageable size and whose manual review would be economically prohibitive. In an adversarial context such as a lawsuit, it is likely your opponent will not simply trust that you acted in good faith or knew what you were doing. They may argue that you don’t in fact know what you are doing, that you are trying to hide something, that you are trying to avoid spending adequate resources to meet your legal obligations, or all of the above. In the warm and fuzzy world of litigation, challenging process can be an effective means of applying leverage regardless of the merits of case. Put another way, if your adversary can successfully attack how data was searched and selected, he or she can score points or even prevail without ever having to prove his or her case. In cases involving particularly poor data handling, judges may impose sanctions that effectively allow a party to win the day without having to prove the case—the presumption being that the other side withheld or destroyed evidence supporting the claim.

For the most part, the methods that lawyers use to search for data is based on a limited understanding of how searches work, how they can go wrong and what tools are available to make their lives easier. The lack of expertise in this context can be a recipe for failure.

In Search of Valid Data Searches

Everyone talks about the weather but nobody does anything about it.

Very few people would argue with the idea that you need to be able to document and explain the steps you took to search for data in response to a document demand. But as Mark Twain said about the weather, “Everyone talks about it but nobody does anything about it.” The essential elements of conducting valid and defensible searches can be summarized in five fundamental rules:

1.    What you don’t know can hurt you.  
Virtually all aspects of discovery involve searches. And you may be asked to explain and defend any or all of the searches you conduct. The first step in understanding what constitutes a valid search is defining what is meant by a search. Searching for data in litigation is often conceived of as identification of information within a set of data that has been collected and processed for review. But searching for data occurs at a number of different times in the course of a case. Some of the most common searches include:

a.    searching for potentially responsive data requiring preservation based on the scope of the legal action;

b.    deciding what data to collect for analysis;

c.    filtering data for processing to best target responsive data (e.g., by date, author, subject matter, etc.);

d.    searching data that has been processed for relevance to specific issues in response to a document or other discovery request;

e.    searching for data that meets specific criteria for production to the requesting party;

f.     searching data in preparing the case (i.e., supporting a claim or defense in the lawsuit);

g.    searching data in preparation of taking or defending a deposition of an expert or fact witness.

2.    Every Step You Take, Every Move You Make.
In any technological effort in data discovery you should work under the assumption that every step you take and every decision you make will be highly scrutinized and may be challenged. With that in mind, consider whether you will be able to explain and justify all actions taken in the course of managing the data in the case. This ability to document and defend is particularly critical regarding how data is searched. The search results can dramatically affect the economics of the case and, therefore, may be rigorously examined. If the search results are found inadequate, the result may be a high-profile failure—as the case law demonstrates. To be clear, scrutiny of the search process does not mean that every decision or process needs to be perfect—perfection, while admirable, is rarely if ever achieved when human efforts are involved. The legal standard is not perfection. What is required is to do what is reasonable under the circumstances. Reasonableness, however, is often in the eye of the beholder, which includes at a minimum, you, your adversary and the judge. So for the purposes of critically analyzing your efforts, you need to put yourself in the position of a neutral observer and give your process a cold, hard look. In cases with a lot on the line, you should consider engaging an outside expert to assist in creating valid and effective searches or, at a minimum, to review and assess how your searches were conducted.

In Search of Valid Data Searches


3.    If It’s Not Documented, It Didn’t Happen

This rule is the corollary to the previous one and is essential to bear in mind if a challenge actually occurs. As anyone involved in litigation knows all too well, lawsuits can last a long time. Challenges to discovery processes in large cases can take place months or even years after the tasks were completed. For that reason, as well as the general unreliability of human memory, it is essential that a well-defined and consistent approach to documenting decision making and implementation of a repeatable process is adopted in every case. This is particularly important with regard to data searching because the process is often iterative, involving many attempts and revisions. Even where mistakes are made, the benefit of having documentation for the steps that were taken can go a long way toward reducing the temperature and avoiding severe sanctions even when things go wrong.


4.    What makes a sample ample?

An essential element of conducting a valid search is the well-executed testing of results. Since sets of potentially responsive data sets are typically too large to look at each item, which is the whole reason searching data is done in the first place, using a statistically valid sample to test the results of a search is crucial to demonstrate the accuracy of the search. In-depth discussion of different types of sampling methods is beyond the scope of this article, but the critical point is that decisions made about what type of sample to use (e.g., random, stratified, uniform, systematic and various combinations) and what sample size to use must be well informed and well documented. Specific knowledge and expertise is crucial for this purpose whether that expertise is found within the organization or outside, the approach should be well-reasoned, documented and explainable.


5.    Learning from your mistakes.

Surprisingly, the failure of many search processes occurs after errors have been detected rather than resulting from the lack of detection in the first place. Suppose you sample the results of your search, including documents that hit on search terms and those that didn’t. Now what? Is it enough to simply correct the specific errors that were found in the sample? Very likely not, since the purpose of a sample is to get a representation of the quality of the overall results—not to identify every error that exists. Yet many practitioners make this mistake. Detecting errors should be a point of departure for improving the process, not the end of it. Refine the search criteria, rerun searches, then sample and test again until you reach what you believe is a reasonable and defensible (again, not necessarily perfect) error rate (see Rule 2).


While these fundamental rules provide a foundation for conducting valid searches, the devil is in the details and the execution. The process of designing and executing valid searches needs to be undertaken with the seriousness and time commitment it deserves and with the required knowledge and expertise. Winging it in the world of high-stakes litigation can lead to a painful search for explanations of missteps and possibly new employment.

About the Author
Thomas Barnett is managing director and e-discovery practice leader at Stroz Friedberg LLC, which helps clients manage their digital risks in the areas of digital forensics, data breach and cybercrime response, electronic discovery, business intelligence and investigation, and security risk consulting.