A new age dawns
On 16 February 2016 the English High Court finally gave its endorsement of the use of predictive coding technology in standard disclosure ( Pyrrho Investments Limited & Anor -v- MWB Property Limited & Ors). Ashurst welcomes this development. In our experience predictive coding can improve the accuracy and efficiency of reviews and therefore help to reduce the huge costs associated with large e-disclosure exercises. We expect that predictive coding will soon become commonplace in cases involving a large volume of ESI (Electronically Stored Information). Indeed the Pyrrho case might just have sounded the death knell for front-end production-line linear review, with its inherent inefficiencies, inaccuracies and inconsistencies.
The predictive coding process
There are a number of predictive coding software available for different purposes. Ashurst has recently completed a very large disclosure exercise using a continuous learning predictive coding software, which broadly works as follows:
- A technology-assisted review using predictive coding starts much like any other e-disclosure exercise; electronic documents are collated from various sources and culled by applying traditional parameters such as date ranges, document type, custodian and keywords. Broader initial parameters may be applied on the basis that predictive coding will assist in whittling down the volume of documents that will need to be manually reviewed.
- The review population is uploaded onto the predictive coding platform and a randomly generated, statistically valid control set is reviewed and coded by relevance and issues. The control set indicates the volume of relevant documents within the review population, which will be helpful to benchmark the progress of the review.
- Document reviewers review the software's suggestions, confirming whether or not the documents are relevant. The software "learns" from the reviewers' coding decisions and once these documents have been reviewed suggests a further iteration of potentially relevant documents for review. This process is repeated until the number of relevant documents identified approaches the total identified by the control set, and the number and quality of relevant documents suggested by the software indicates it is disproportionate to continue the review.
- The decision to stop the review is validated by the review of a statistically valid sample from the unreviewed "rump". If that confirms that the percentage of relevant documents in the unreviewed document population is very low (we have argued below 5 per cent of the overall relevant population) the review can be stopped.
Our experience
Ashurst recently represented a financial services company in a lawsuit against its former auditors alleging negligence. Even with the application of custodian, keyword search terms and date ranges, the number of documents to be reviewed was staggering - approximately two million. As a result of the use of predictive coding in this case:
- approximately 700,000 documents were eliminated from review, which amounted to nearly a one third reduction in the overall review effort;
- all disclosure deadlines were honoured despite a pressing timetable (initial disclosure was completed in just four months);
- the large disclosure exercise was conducted in a cost-effective manner; and
- key documents were identified during the crucial first months of review for stronger case preparation and assessment of merits.
How do you make the process as robust and defensible as possible?
Engage your opposing party early
Before embarking on any disclosure exercise the CPR Rules require the parties to discuss and if possible agree the approach to be taken to disclosure. This is particularly important if you intend to use predictive coding as it provides an opportunity to flush out any potential challenges. A sensible approach may be to meet with the parties' respective technical advisers to explain your predictive coding workflow and address any issues that may raise. This meeting should be used to draw up a protocol for the predictive coding exercise. For example, establishing whether any classes of documents should be excluded from the process, the size of the control set, where "the line will be drawn" in terms of stopping the manual review, and how the decision to "draw the line" will be validated.
Plan ahead
Ensure that your team and counsel are fully engaged and understand both the principles behind, and the practicalities of, the predictive coding process so that the appropriate preparations and resources are put in place. Time spent at this stage will pay dividends later, especially when you are able to respond effectively and decisively to any challenges or queries made by your opposing party as to the process adopted.
You should also consider how you will tackle documents with minimal or no text, e.g. spreadsheets, audio files, small image files, poor quality hard copy files (with poor quality searchable text). These files may need to be isolated and considered separately from the predictive coding exercise.
Establish the best possible control set at the beginning of the process
During the preparation stage ensure that the control set is:
- Representative of all relevant issues and subject matter. If not, consider whether supplemental "control sets" of relevant or "hot" documents relating to certain issues should be added to train the software in order to ensure that the full range of issues and subject matter are identified and promoted for review.
- Sampled from all relevant sources of electronic data. If not, you may be open to challenges that the software has not been appropriately trained to identify documents relevant to certain issues. If new sources of data are required to be uploaded onto the platform, best practice is to conduct an updated control set.
- Reviewed by senior members of your team who are familiar with the issues in the proceedings. The initial control set gives the whole predictive coding process its first direction and momentum as the software learns from the coding of the control set to identify likely relevant documents within the unreviewed population. The more accurate the software is, the less time will be spent reviewing non-relevant documents.
You should also be prepared for the possibility that the original control set will be disclosed in order to satisfy your opposing party and/or the court that it has been correctly coded and covers the full range of issues.
Rubbish in rubbish out
With each iteration, the human reviewers' relevance and issues coding are used to train the software, improving the accuracy in its ability to identify likely relevant documents in the "rump". Accordingly, the quality of the human review will determine the quality and effectiveness of the software. It is therefore important to ensure that those who are conducting the review are familiar with the issues in the proceedings and that a quality control process is put in place. For example, we would recommend that a senior lawyer within your team conducts a review of a sample of documents from each iteration. If you are using external document reviewers, we suggest conducting the quality control review in-house.
Validate your success
A statistically valid sample of unreviewed documents should be reviewed in order to confirm that the number of likely relevant documents within the unreviewed rump falls below a level that would be reasonable or proportionate to review. In addition, you may also consider reviewing documents responsive to "core" keywords or conducting targeted keyword searches. The validation exercise should be carried out by a senior team member. As noted above, the size and scope of the validation exercise and the information about this exercise you are prepared to share should be agreed in advance with the opposing party.
Key Contacts
We bring together lawyers of the highest calibre with the technical knowledge, industry experience and regional know-how to provide the incisive advice our clients need.
Keep up to date
Sign up to receive the latest legal developments, insights and news from Ashurst. By signing up, you agree to receive commercial messages from us. You may unsubscribe at any time.
Sign upThe information provided is not intended to be a comprehensive review of all developments in the law and practice, or to cover all aspects of those referred to.
Readers should take legal advice before applying it to specific issues or transactions.