|
Document Clustering provides an automated way for you to group similar documents together. Clustering can help you understand the facts of your case, find the needle in the haystack, eliminate irrelevant data, and speed up your review by having key staff review the most important documents first.
What is Clustering?
Catalyst’s clustering technology looks at a set of documents and identifies word pairs in the text of the documents that are common in the set. It then groups those documents together to create clusters. It then names the clusters using the most important terms contained in that group. The clusters are shown in folders on CR. They look something like this:

When clustering documents, there are bound to be outliers that don’t exactly fit into a proper cluster. These documents are grouped together in a cluster labeled unmatched documents.
Catalyst provides several options for document clustering including Key Documents, Static, and Anti-Document (Spam) Clustering.
Key Documents
One approach to clustering is to use key documents known to be of interest, such as hot documents, and use them to find other similar documents. This is particularly useful to legal teams familiar enough with a case to know what they are looking for or have a group of documents they think are interesting. From those documents, we extract their concepts and introduce them into the whole population of documents and build the clusters around them.
Static Clustering
Static clustering is often used for first pass analysis when you do not know enough about the case to be able to identify key documents. Catalyst can work with you to build the specialized searches that will serve as the basis for the clustering. There are several ways to do this, including:
- Taking key terms from pieces of other documents and creating a “synthetic” key document.
- Using interesting terms from a case ontology and building a weighted search based on the outline.
Anti-Documents (Spam) Clustering
Clustering can also be used to pinpoint spam or junk documents. Instead of using Key Documents, Catalyst uses representative Spam and other “noisy” documents as the basis of the search. The matching documents can be removed from the collection altogether or set aside for a lower level of review.
For more information on using Catalyst Clustering for your matter, please contact Catalyst Consulting at
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
.
|