Report and Charts

After each ranking operation, the Predict Dashboard and associated reports and charts are automatically updated and available for your review. Once coding decisions are made, the documents go through the ranking algorithm, which locates words within the documents. The words (terms) are ordered from high-ranking to low-ranking. The remainder of the document population order is based on the words within the documents.

Because Predict automatically finds reviewed documents in Insight, there can be hundreds of ranking operations on a project. Predict can rank these documents with amazing speed -- it can rank more than one million documents in less than five minutes.

Open your project and click Dashboard to view overall project statistics. Information is available on each of the tabs:

Use the information on these tabs to determine when you can cutoff your review. Refer to the following:

 

Note: The number of documents in the collection is at the top of the page next to the project name. If this number is highlighted in yellow, the number is new, and the graph must be rebuilt. Point to the number to see the new information.

 

Review Statistics Tab

The information at the Review Statistics tab includes:

0022_Estimated_Richness.png

In this example, the system estimates that:

0023_QC.png

Daily Richness

This report is used to help review managers manage their review teams and know when to stop the review based on the richness observation. The richness is calculated on a daily basis so the manager can compare current richness against the historical high-water mark.

As Predict feeds the most likely relevant documents to the review team first, the richness percentage will be high because the reviewers are reviewing mostly relevant documents. As the project progresses, the manager will notice the richness percentage will start to decrease because the reviewers are running out of relevant documents. This chart helps them follow the pattern and make educated decisions on when to stop a review.

The calculation table shows the number of documents reviewed, how many are relevant and the calculated richness. This is shown for each day of the review.

Below the table is the bar chart, which is an illustration of the increasing and decreasing richness percentage per day.

Progress Tab

Click the Progress tab to display:

Flux Chart

The Flux chart represents the stability of the ranking throughout the project. This chart can help you determine when the ranking fluctuation stabilizes, which indicates that the learning of the Predict engine is at a maximum. It measures the number of changes in ranking (fluctuations on the y-axis) against the number of document seeds (on the x-axis) ranked over time.

To see the average number of ranking changes in the positive and negative directions, point to the ranking operation, depicted by a circle. The width of the gray area behind the line on the chart indicates significant change in the average document ranking. If the gray area is small or nonexistent, there is little fluctuation in the ranking, which means the ranking is stabilizing.

Note: This chart is only available after two rankings have occurred.

Progress_001.png

 

In this example, the amount of ranking fluctuation drops significantly after five ranking rounds, with about 80 seeds submitted for ranking. After 250 seeds, it levels off. Then, with about 400 seeds submitted, it spikes and quickly stabilizes again. These changes could be due to a new set of diverse documents being added to the population, or it could be due to a change in reviewers.

To zoom into a portion of the chart, use the size controls that appear below the chart. Drag the left and right sides in either direction to change your view.

Gains Chart

The Gains chart (also referred to as a Cumulative Yield Chart) appears beneath the Flux chart. The y-axis displays the Cumulative Total of Relevant Documents (positive seeds) and the x-axis displays the total number of documents in the collection.

Gains.jpg

 

In this example, the number of relevant documents begins to taper off after about 9,700 seeds have been reviewed. There are fewer relevant documents in the 9,700 and 12,000 document range. If the trend continues, we can estimate that a majority of the relevant documents have been reviewed.

Click the Include Related Docs button to update the chart to include the related documents. Click it again so the chart only reflects the documents the Predict engine assigned to the previously created Review Project. To zoom into a portion of the chart, use the size controls that appear below the chart.

High Ranked Positive Terms Tab

Click the High Ranked Positive Terms tab to review a list of the top 10,000 highest-ranked terms in the document collection. This list can be useful when you are looking for key words in your data. Notice that the terms might change dramatically when you compare earlier rankings to later ones, especially if new types of documents are introduced into the collection and ranked.

0027_High_Ranking_Terms.png

Top Ranked Docs by Custodian Tab

Click the Top Ranked Docs By Custodian tab to display this chart, which shows the custodians with the most highly ranked documents.

Custodians.png