Compare Models
  • 16 Jan 2025
  • 6 Minutes to read
  • Dark
    Light
  • PDF

Compare Models

  • Dark
    Light
  • PDF

Article summary

This article applies to these versions of LandingLens:

LandingLensLandingLens on Snowflake

The Models page is a great way to get a high-level view of how different models performed on multiple datasets at once. However, if you'd like to see more details about how two specific models compare, use the Compare Models tool. The Compare Models tool is a great way to evaluate performance on multiple iterations of the model. It can help you identify if you need to improve your labels, datasets, and hyperparameters. 

When you run the Compare Models tool, you set a baseline model and a candidate model. LandingLens then shows you if the candidate model performed better or worse for each prediction outcome (False Positive, False Negative, Correct, etc). You can even see a side-by-side comparison of how the baseline and candidate models performed on each image in the dataset.

Model Comparison

Note:
Comparisons are generated on-demand, and are not saved.

Run the Compare Models Tool

To compare two models:

  1. Open the project to the Models tab.
  2. Hover over the cell for one of the models you want to compare and click the Compare icon that appears. This model will be the baseline in the comparison. In other words, the second model will be compared as either better or worse than this model. 
  3. Click the cell for the second model you want to compare. This model will be the candidate in the comparison. In other words, this model will be compared as either better or worse than the first model.
    Note:
    Want to switch the baseline and candidate models? Click the Switch icon.
  4. Click Compare.
    Select the Models to Compare
  5. The Compare Models window opens and shows the difference in performance between the two models.
    Model Comparison

Model Performance

The top of the Compare Models window show scores for the baseline and candidate models. Click the link below a model to see the Training Information for that model.

The score type depend on the project type:

  • Object Detection: F1 score
  • Segmentation: IOU (Intersection over Union)
  • Classification: F1 score
Model Scores for an Object Detection Project

The window also shows the difference in score between the two models. Click the link below the score difference to see the dataset snapshot that the models were evaluated on.

Difference Between Model Scores in an Object Detection Project

Compare Training Settings

In the Compare Models window, click Compare Training Settings.

Compare Training Settings

This opens a table with a side-by-side comparison of the settings used to train each models. Differences in settings are highlighted.

Compare the Training Settings for the Baseline and Candidate Models

Confusion Matrices

By default, the Compare Models window compares the two models using a confusion matrix for each prediction outcome. A confusion matrix is a table that visualizes the performance of an algorithm—in this case, the two computer vision models that you're comparing.

First, the data is grouped into tables (confusion matrices) based on prediction outcome. The prediction outcomes include:

  • False Positive: The model predicted that an object of interest was present, but the model was incorrect. This is only applicable to Object Detection and Segmentation projects.
  • False Negative: The model predicted that an object of interest was not present, but the model was incorrect. This is only applicable to Object Detection and Segmentation projects. 
  • Misclassified: The model correctly predicted that an object of interest was present, but it predicted the wrong class.
  • Correct: The model’s prediction was correct. This includes True Positives and True Negatives. 
Example of Confusion Matrices for Object Detection Models

Ground Truth and Predictions

Each confusion matrix focuses on a specific prediction outcome (False Positive, False Negative, etc). Each row in a matrix represents each instance of the outcome that occurred in both the baseline and candidate models. The first column is the Ground truth, which is the labeled class on the image in the dataset. The second column is the Prediction, which is a class that either the baseline or candidate model predicted incorrectly.

Ground Truth vs. Predictions
Note:
Each confusion matrix only has rows for the actual ground truth / prediction pairings that occurred. It doesn't have a row for every possible ground truth / prediction pairing that could have occurred, because then the instance count for both the baseline and candidate models would be 0.

Baseline, Candidate, and Differences

For each Ground Truth / Prediction pairing in a confusion matrix, LandingLens shows how each model performed and how the candidate model either improved or got worse. This information is displayed in the Baseline, Candidate, and Differences columns.

The Baseline and Candidate column depend on the project type:

  • Object Detection and Classification: The number of times that the model made that prediction for the specific Ground Truth / Prediction pairing.
    Number of Times Each Model Made the Prediction (Object Detection)
  • Segmentation: The number of pixels for which the model made that prediction for the specific Ground Truth / Prediction pairing.
    Number of Pixels for Which the Model Made the Prediction (Segmentation)

The Differences column shows if the candidate model improved or got worse, when compared to the baseline model. The following table describes the possible outcomes in the Differences column.

OutcomeDescription
GreenThe candidate performed better than the baseline.
RedThe candidate performed worse than the baseline.
FixedThe baseline made errors, but the candidate did not. In other words, the candidate "fixed" all of the issues that the baseline had.
New ErrorThe baseline did not make errors, but the candidate did. In other words, the candidate introduced a "new error type" that wasn't present in the baseline.
PercentageBoth the baseline and candidate made errors. In this case, the Difference is calculated as:
((candidate - baseline) / baseline) * 100
--This is only applicable to the Correct category. Either the baseline or candidate made mistakes, but the other model did not.

For example, this is how the Differences column looks in an Object Detection project:

The Differences in an Object Detection Project

View Images for a Confusion Matrix

Click View next to the Differences column of a confusion matrix (or simply click the row) to see the images that are included in this Ground Truth / Prediction pairing. Click the images to see a larger version of the images.

View Images from a Confusion Matrix

View Images - Overlays

Images have overlays that show the relevant predictions for the confusion matrix. If the model missed an object of interest, the overlay is white. Otherwise, the overlay colors correlate to the class colors.

The overlay formatting is different for each project type, as described in the following sections.

View Images - Object Detection

Each relevant prediction displays as an overlay. The number of predictions displays in the bottom right corner of the image. Some confusion matrices have additional overlays, as described in the following table.

Confusion MatrixOverlay DescriptionExample
False PositiveThe overlay includes the confidence score of the prediction.
False NegativeThe overlay includes "Missed", because the model predicted that an object of interest was not present, but the model was incorrect.
CorrectThe overlay includes the confidence score of the prediction.

View Images - Segmentation

There are overlays over the regions that the model predicted incorrectly. It is important to note that the overlay does not show the full prediction, but only the part that was wrong for this this specific Ground Truth / Prediction pairing.

The overlay format is slightly different for each confusion matrix, as described in the following table.

Confusion MatrixOverlay DescriptionExample
False PositiveThere is a purple striped overlay over the regions that the model predicted incorrectly.
False NegativeThere is a white striped overlay over the regions that the model "missed".
CorrectThere is a purple striped overlay over the regions that the model predicted correctly.

View Images - Classification

Because classes are assigned to an entire image in Classification projects, it doesn't make sense to show the predictions as an overlay. Therefore, LandingLens shows the images and lists the ground truth and predictions next to the images. 

Images in a Misclassified Confusion Matrix for a Classification Project

Compare All Images

Click All Images to see a visual comparison of all images. This shows three versions of each image in the evaluation dataset:

  • Ground truth: The original image with the ground truth (labels) that you added.
  • Baseline model: The predictions of the candidate model.
  • Candidate model: The predictions of the candidate model. LandingLens highlights an image if the candidate model performed better or worse than the baseline for that specific image.
Compare All Images

Click a set of images to see more information about those images.

More Information for the Images

Was this article helpful?

What's Next