Performance Report
  • 18 Feb 2025
  • 8 Minutes to read
  • Dark
    Light
  • PDF

Performance Report

  • Dark
    Light
  • PDF

Article summary

This article applies to these versions of LandingLens:

LandingLensLandingLens on Snowflake

Clicking an evaluation set score on the Models page opens the Performance Report tab (you can also click a model on the Models page and then select this tab). 

This report shows how the model performed on the selected evaluation set (and not for the entire dataset). You can select different sets from the Evaluation Set drop-down menu.

Performance Report

Analyze Model Performance

Watch the following video to learn how to use the Performance Report and related tools to analyze and improve model performance.

Adjust Threshold

If you have an Object Detection or Segmentation project, you can see how the model performs on the evaluation set with different Confidence Thresholds. To do this:

  1. Open the Performance Report.
  2. Click Adjust.
    Click "Adjust"
  3. Change the Confidence Threshold by using the slider or entering a value in the text box. 
  4. If you want to see a full performance report for the selected threshold, click Generate a New Report.
    See How Different Confidence Thresholds Impact Performance
  5. LandingLens creates a new performance report for the selected threshold. (This is a temporary report. If you close and then later reopen the report, the data will be for the original Confidence Threshold.)
    See the Full Performance Report for the Selected Confidence Threshold

Overall Score for the Evaluation Set

The Performance Report includes a score for the evaluation set (and not for the entire dataset). The type of score depends on the project type:

Object Detection and Classification: F1 Score

The Performance Report includes the F1 score for Object Detection and Classification projects.

F1 Score for the Evaluation Set in an Object Detection Project

For Object Detection, the F1 score combines precision and recall into a single score, creating a unified measure that assesses the model’s effectiveness in minimizing false positives and false negatives. A higher F1 score indicates the model is balancing the two factors well. LandingLens uses micro-averaging to calculate the F1 score.

For Classification, the F1, Precision, and Recall scores are identical. This is because Classification models have only two prediction outcomes: "Correct" and "Misclassified". Therefore, the F1, Precision, and Recall scores for Classification models are all calculated using this algorithm:

                

Segmentation: Intersection Over Union (IoU)

The Performance Report includes the Intersection over Union (IoU) score for Segmentation projects.

IOU Score for the Evaluation Set in a Segmentation Project

Intersection over Union (IoU) is used to measure the accuracy of the model by measuring the overlap between the predicted and actual masks in an image. A higher IoU indicates better agreement between the ground truth and predicted mask. LandingLens does not include the implicit background and micro-averaging in the calculation of the IoU.

Precision Score for Evaluation Set

The Performance Report includes the Precision score for the evaluation set (and not for the entire dataset).

Precision Score for the Evaluation Set in an Object Detection Project

Precision is the model’s ability to be accurate when it says something is true. Precision answers the natural language question, “When the model makes a prediction, how often is it correct?” This metric shows how accurate the model predictions are. The higher the Precision score, the more accurate the predictions are. 

For Object Detection and Segmentation, Precision is calculated using this algorithm:

                 

For Classification, the F1, Precision, and Recall scores are identical. This is because Classification models have only two prediction outcomes: "Correct" and "Misclassified". Therefore, the F1, Precision, and Recall scores for Classification models are all calculated using this algorithm:

                

Recall Score for Evaluation Set

The Performance Report includes the Recall score for the evaluation set (and not for the entire dataset).

Recall Score for the Evaluation Set in an Object Detection Project

Recall is the model’s ability to find all objects of interest. Recall answers the natural language question, “Of all the labels (ground truths) in the dataset, what percent of them are found by the model?” It conveys how accurately the model can correctly identify all the actual positive instances in the dataset. The higher the Recall score, the lower the chance the model will have a false negative.

For Object Detection and Segmentation, Recall is calculated using this algorithm:

 

For Classification, the F1, Precision, and Recall scores are identical. This is because Classification models have only two prediction outcomes: "Correct" and "Misclassified". Therefore, the F1, Precision, and Recall scores for Classification models are all calculated using this algorithm:

                

Download CSV of Evaluation Set

For Object Detection and Classification projects, click Download CSV to download a CSV of information about the images in the evaluation set. The CSV includes several data points for each image, including the labels ("ground truth") and model's predictions.

Download a CSV for the Evaluation Set

CSV Data for Evaluation Set

The CSV includes the information described in the following table.

ItemDescriptionExample
Image IDUnique ID assigned to the image.30243316
Image NameThe file name of the image uploaded to LandingLens.sample_003.jpg
Image PathThe URL of where the image is stored.s3://path/123/abc.jpg
Model IDUnique ID assigned to the model.a3c5e461-0786-4b17-b0a8-9a4bfb8c1460
Model NameThe name of the model in LandingLens.Model-06-04-2024_5
GT_ClassThe classes you assigned to the image (ground truth or “GT”) .

For Object Detection, this also includes the number of objects you labeled. 
{"Scratch":3}
PRED_ClassThe classes the model predicted.

For Object Detection, this also includes the number of objects predicted.

If the model didn't predict any objects, the value is {"null":1}.
{" Scratch":2}
Model_CorrectIf the model's prediction matched the original label (ground truth or “GT”), the value is TRUE.

If the model's prediction didn't match the original label (ground truth or “GT”), the value is FALSE.

Only applicable to Classification projects.
TRUE
PRED_ConfidenceThe model's confidence score for its prediction.

Only applicable to Classification projects.
0.9987245
GT-PRED JSONThe JSON output comparing the original labels (ground truth or "GT") to the model's predictions. For more information, go to JSON Output.{"gtDefectName":"No Fire","predDefectName":"No Fire","predConfidence":0.9684047}

Confusion Matrix

The Performance Report includes a Confusion Matrix that counts ground truth labels versus model predictions. The confusion matrix shown here is for the selected evaluation set.

The y-axis represents each ground truth label. The x-axis represents each possible model prediction. 

Each cell shows the count of instances that correspond to particular ground truth class-predicted class pair. For example, in the image below, the model correctly predicted the class "Wheat" 6 times and misclassified it 2 times.

Confusion Matrix

Precision Score for Class

The Precision score for each class is listed along the x-axis. Precision answers the natural language question, “When the model predicts Class A, how often is it correct?” 

The Precision score for a class is the percentage of instances that the model correctly predicted the class out of all instances that the model predicted the class, and is calculated using this equation:

                                   

For example, let’s calculate the Precision score for the Wheat class in the image below. The model predicts Wheat 7 times. Of those, 6 are correct (True Positives) and 1 is incorrect (False Positives). When we plug those numbers into the Precision equation, we see that the Precision for this class is 85.7%.

                     

              

Precision for Wheat

Recall Score for Class

The Recall score for each class is listed along the y-axis. Recall answers the natural language question, “Of all the Class As in the dataset, what percent of them are found by the model?”

The Recall score for a class is the percentage of instances that the model correctly predicted the class out of all actual instances of the class, and is calculated using this equation:

                                   

For example, let’s calculate the Recall score for the Wheat class in the image below. The dataset has 8 instances of Wheat. The model predicts 6 instances correctly (True Positives) and 2 instances incorrectly (False Negatives). When we plug those numbers into the Recall equation, we see that the Recall for this class is 75.0%.

                     

              

Recall for Wheat

Use Colors to Help Interpret Performance

Each cell has a color that can quickly help you identify correct classifications and errors. Darker colors indicate a higher number, and lighter colors indicate a lower number. 

For example, if the model correctly predicts all instances, only the cells on the diagonal will be blue and have non-zero values. See the following image as an example.

The Model Predicted All Classes Correctly in this Evaluation Set

If any off-diagonal cells contain values, start by looking for the darker colors to understand where the model is making errors. For example, let’s say you want to evaluate performance for the Rice class using the confusion matrix below. The model correctly predicted 1 instance and misclassified 3 instances. Consider looking at the instances that were misclassified as Corn first, since that cell is darker (and has a higher number) than the Wheat cell.

When Evaluating Performance for the "Rice" Class, First Look at Instances Misclassified as "Corn"

Click a Cell for Detailed Information

Click a cell in the confusion matrix to see detailed information for that ground truth / prediction pairing. For example, click the Wheat/Corn cell in the image below to see the image that was labeled as Wheat but predicted as Corn.

The table on the left shows the ground truth / prediction pairings for all images in the evaluation set. The section on the right shows the images that represent the cell you clicked.

Click a Cell to see a Table of All Results (left) and the Images for that Ground Truth / Prediction Pairing (right)

Analyze All Images

Click Analyze All Images to see all images with their ground truth labels and predictions. Click an image to see a larger version.

For Object Detection and Segmentation, LandingLens shows a side-by-side comparison of the ground truth labels and predictions on each image in the dataset. 

Analyze All Images for Object Detection

For Classification, LandingLens shows each image, the ground truth class, and whether the model predicted the class correctly (green check mark) or not (red "x").

Analyize All Images for Classification



Was this article helpful?

What's Next