Download free ebook

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Complimentary eBook offered by
ARTIFICIAL INTELLIGENCE FOR
YOUR DAILY BUSINESS
Understanding the Spectrum of AI-technologies for
Review, Investigation and Contract Analytics in eDiscovery
“
ARTIFICIAL INTELLIGENCE IS NO MATCH FOR NATURAL STUPIDITY”
( Anonymous )

Techniques from the world of Artificial Intelligence (AI) are rapidly finding their way into today’s
business practices. They are being used to accelerate the speed and efficiency of an organization’s
internal processes. The main reason for this success is that after several decades of research,
AI techniques are not only allowing us to process enormous amounts of data 24/7 at staggering
speed, but are also consistently performing on par with (and often even better and more consistently than) humans. This results in revolutionary productivity gains.
Although it is clear there is high value in Artificial Intelligence for review, investigation, contract
analytics, eDiscovery and other legal fact-finding missions, organizations still struggle to understand the different techniques involved. In this eBook we will explain different techniques and illustrate these with practical business-related examples.
cept Search Technology Assisted Review Topic Modeling Big Data Analy
Machine Learning Clustering Data Mining Predictive Coding
Text Mining Natural Language Processing Machine Translation Audio Se
AI - WHAT’S IN IT FOR ME?
Machine Learning, Natural Language Processing (NLP) and similar techniques such as data- or
text-mining, big data-analysis, predictive coding, Technology Assisted Review (TAR), concept
search, topic modeling, clustering, audio search and machine translation are all Artificial Intelligence techniques and can be used to identify specific document categories and to search for
relevant information in documents. These techniques are being used to enhance the speed and
efficiency of eDiscovery practices and can also be used to accelerate other legal processes.
cept Search Technology Assisted Review Topic Modeling Big Data Analy
Machine Learning Clustering Data Mining Predictive Coding
Text Mining Natural Language Processing Machine Translation Audio Se
Machine Learning
Natural Language Processing
ining Natural Language Processing Machine Translation Aud
Machine Learning Clustering Data Mining Predictive Coding
earch Technology Assisted Review Topic Modeling Big Data
Machine Learning is the process by which software
recognizes patterns and relationships within large
datasets. A classification system first learns using
“training data”. New pieces of data are then classified
based on the (latent) patterns that have been learnt in
the training data. After sufficient training, the behavior
of new data can be predicted, and it is even possible
to distill information from previously unknown patterns and semantic relationships.
ZyLAB’s Machine Learning uses the most advanced
machine learning algorithms in combination with advanced statistical and semantic methods to represent
the content of a document.
Natural Language Processing refers to the ability of a
computer program to understand spoken language.
NLP is also based on Machine Learning and uses word
processing techniques that treat text like a random
sequence of symbols, but that also considers the hierarchical structure of language; words form a phrase,
phrases make a sentence and sentences convey a
message.
Text Mining
ining Natural Language Processing Machine Translation Aud
Machine Learning Clustering Data Mining Predictive Coding
earch Technology Assisted Review Topic Modeling Big Data
Text Mining, also known as Text Analysis, refers to the
use of varied techniques to automatically enrich data
in large data volumes and then search for hidden patterns and relationships. Once identified, this data can
be filtered, sorted, and visualized; and discovered topics and categories can be prioritized. Text mining identifies and highlights information from patterns and semantic relationships which were previously unknown.
Technology Assisted Review
ining Natural Language Processing Machine Translation Aud
Machine Learning Clustering Data Mining Predictive Coding
earch Technology Assisted Review Topic Modeling Big Data
Technology Assisted Review (TAR), also known as
Computer Assisted Review (CAR) or Predictive Coding,
uses a series of algorithms to search and sort documents relevant for data investigation or eDiscovery.
TAR also utilizes Machine Learning.
ZyLAB uses a variety of methods for automatic document classification to support Technology Assisted
Review (TAR). These patented methods vary from
straightforward search-based, regular expressions
and gazetteers (dictionaries), to advanced methods
using NLP and Machine Learning.
100 %
ZyLAB Machine Learning TAR
Machine Learning for Automatic
Document Classification
RECALL
OCR on Bitmaps, Visual Classification, Text-Mining, Audio
Search & Machine Translation
Search on Extracted Metadata
(document properties, file
properties, forensics)
Fuzzy, Wildcard,
Quorum, Proximity,
Relevance
Ranking
Traditional
Boolean
Search
0%
ZyLAB Rules-based TAR
Topic Modeling / Clustering
ining Natural Language Processing Machine Translation Aud
Machine Learning Clustering Data Mining Predictive Coding
earch Technology Assisted Review Topic Modeling Big Data
Topic Modeling & Cluster Analysis
Two other approaches to text mining. A topic model is
used to statistically explore abstract concepts (topics)
that occur within a set of documents.
Cluster analysis uses perceived relationships between
various groups of objects to create new sub-groups
(clusters). These documents are ideal for use with
Machine Learning.
COMBINING DIFFERENT APPROACHES
The advantage of full-text search and text-mining techniques are that they are transparent, and
that every contract lawyer knows how to use full-text search and how to combine different search
techniques. The problem of an incorrectly classified document can be fixed by the lawyer simply
changing the query. The effort of writing queries can be combined in libraries of full-text queries,
which can be shared and re-used. The queries can also be easily translated into other languages.
This is not always the case when using Machine Learning, which is more of a black box that either
works or not and, in the latter case, is hard to fix. Furthermore, Machine Learning is not transparent enough for users to directly understand why a document is classified into a specific category.
Because Machine Learning uses specific document sets for “training data”, the learn patterns are
not always relevant for documents that differ too much from them.
As each technique clearly has its own advantages and disadvantages, it is best to allow the user to
combine the different methods to achieve the highest possible recall and precision. This is exactly
what ZyLAB does: it starts with simple, straightforward and transparent techniques and expands
into more advanced methods when needed.
PRACTICAL USE CASE
AI IN LITIGATION & ARBITRATION (EDISCOVERY)
ZyLAB eDiscovery is a complete end-to-end solution for all your discovery and regulatory needs.
AI-techniques are used for:
•Automatic identification of relevant documents for litigation and arbitration (eDiscovery) using
sample documents;
•Automatic clustering and classification of documents into relevant groups and sub-groups;
•Searching the content of images and videos without the need to add textual descriptions;
•Automated machine translation technology to quickly translate all information up front: this can
then be tagged and reviewed in ZyLAB’s highly intuitive review platform. This way relevant data
is quickly uncovered and critical information can be routed for specialized human translation if
needed.
cept Search Technology Assisted Review Topic Modeling Big Data Analy
Machine Learning Clustering Data Mining Predictive Coding
Text Mining Natural Language Processing Machine Translation Audio Se
PRACTICAL USE CASE
MERGERS & ACQUISITIONS (M&A) AND LARGE CORPORATE TRANSACTIONS: AI FOR CONTRACT DISCOVERY, REVIEW AND ANALYSIS
Many organizations keep track of their agreements and other relevant documents in a contract
management system. Next to monitoring deadlines, notice periods, warranties and guarantees,
these systems are also used to generate documents used to fill a data room with the relevant
documents.
ZyLAB’s eDiscovery technology helps to identify contracts from live data locations such as email
boxes, SharePoint or file shares. During processing, all documents are analyzed for additional
metadata, specific content, email threads, duplicates, privileged information and much more. The
outcome of this process can be used to generate documents used to fill a data room with the relevant documents.
Get better insight in your data without having to search and review the actual data itself. Text
analysis helps you find entities such as organizations, persons and more. Code words and other
patterns like sentiments, requests and travel activities can be extracted and can guide you straight
to the relevant information.
PRACTICAL USE CASE
AI FOR LEGAL FACT FINDING, FRAUD AND INTERNAL INVESTIGATIONS
Legal fact finding is key in all data investigations, whether conducted in relation to a crime, an
internal fraud case or a request for disclosure of government documents.
ZyLAB’s own indexing engine can index up to TBs of data per day and supports access to over 750
different file formats. ZyLAB has been a leader in legal and investigative full-text search since
1983, offering not only industry-standard search functionality, but also unique operations such
as our fast and world-famous fuzzy, quorum, wildcard, proximity, phrase and regular expression
searches.
In addition, ZyLAB allows users to search numeric ranges, dates and file names, and to use text
delimiters to define key fields and text ranges on the fly. These extensive search capabilities, combined with our fast multi-threaded and distributed indexes, help in finding relevant information
faster than any other tool on the market. Hits from your search are highlighted on every document,
even if these were originally image based.
cept Search Technology Assisted Review Topic Modeling Big Data Analy
Machine Learning Clustering Data Mining Predictive Coding
Text Mining Natural Language Processing Machine Translation Audio Se
PRACTICAL USE CASE - AI FOR REDACTION FOR DATA PROTECTION
Identification of any personal data which must be deleted, redacted or anonymized.
1
2
3
The Automated Redaction Process
Unique pseudonyms
Identified names can also be replaced by a unique pseudonym. This way the Personally Identifiable
Information (PII) is redacted and protected, but the relationship between the persons or companies
is maintained by the pseudonyms. Reviewers can review or adjust the automatic redactions by
using sampling or manual review.
PRACTICAL USE CASE
AI FOR FOIA AND PUBLIC RECORDS DISCLOSURES
As the number of information requests has increased exponentially over the past years, organizations worldwide can no longer process all information requests in time. When handling public
records requests, there are many possible levels of automation which can optimize the process,
making it possible to use resources more effectively and to deal with ever increasing data volumes.
ZyLAB implements automation for collection, processing, deduplication, data enrichment, translation, categorization, data visualization, disclosure cost reporting, keyword hit highlighting, search
and tagging, audio and video search, Vaughn Index Creation and bulk redaction.
ZyLAB is positioned as a “leader” in Gartner’s “2015 Magic Quadrant for eDiscovery
Software”, ranked #1 for complete EDRM eDiscovery in the analysts’ “Critical Capabilities for E-Discovery Software 2015” report and has received numerous other
industry accolades over the last 3 decades.
For over 30 years, ZyLAB has worked with professionals in the litigation, auditing,
security and intelligence communities to develop the most advanced solutions
for investigating and managing large sets of information. Our solution is used by
Fortune 1000 companies, government agencies, courts and law firms.
