- 16 Jan 2025
- 6 Minutes to read
- Print
- DarkLight
- PDF
Compare Models
- Updated on 16 Jan 2025
- 6 Minutes to read
- Print
- DarkLight
- PDF
This article applies to these versions of LandingLens:
LandingLens | LandingLens on Snowflake |
✓ | ✓ |
The Models page is a great way to get a high-level view of how different models performed on multiple datasets at once. However, if you'd like to see more details about how two specific models compare, use the Compare Models tool. The Compare Models tool is a great way to evaluate performance on multiple iterations of the model. It can help you identify if you need to improve your labels, datasets, and hyperparameters.
When you run the Compare Models tool, you set a baseline model and a candidate model. LandingLens then shows you if the candidate model performed better or worse for each prediction outcome (False Positive, False Negative, Correct, etc). You can even see a side-by-side comparison of how the baseline and candidate models performed on each image in the dataset.
Run the Compare Models Tool
To compare two models:
- Open the project to the Models tab.
- Hover over the cell for one of the models you want to compare and click the Compare icon that appears. This model will be the baseline in the comparison. In other words, the second model will be compared as either better or worse than this model.
- Click the cell for the second model you want to compare. This model will be the candidate in the comparison. In other words, this model will be compared as either better or worse than the first model.Note:Want to switch the baseline and candidate models? Click the Switch icon.
- Click Compare.
- The Compare Models window opens and shows the difference in performance between the two models.
Model Performance
The top of the Compare Models window show scores for the baseline and candidate models. Click the link below a model to see the Training Information for that model.
The score type depend on the project type:
- Object Detection: F1 score
- Segmentation: IOU (Intersection over Union)
- Classification: F1 score
The window also shows the difference in score between the two models. Click the link below the score difference to see the dataset snapshot that the models were evaluated on.
Compare Training Settings
In the Compare Models window, click Compare Training Settings.
This opens a table with a side-by-side comparison of the settings used to train each models. Differences in settings are highlighted.
Confusion Matrices
By default, the Compare Models window compares the two models using a confusion matrix for each prediction outcome. A confusion matrix is a table that visualizes the performance of an algorithm—in this case, the two computer vision models that you're comparing.
First, the data is grouped into tables (confusion matrices) based on prediction outcome. The prediction outcomes include:
- False Positive: The model predicted that an object of interest was present, but the model was incorrect. This is only applicable to Object Detection and Segmentation projects.
- False Negative: The model predicted that an object of interest was not present, but the model was incorrect. This is only applicable to Object Detection and Segmentation projects.
- Misclassified: The model correctly predicted that an object of interest was present, but it predicted the wrong class.
- Correct: The model’s prediction was correct. This includes True Positives and True Negatives.
Ground Truth and Predictions
Each confusion matrix focuses on a specific prediction outcome (False Positive, False Negative, etc). Each row in a matrix represents each instance of the outcome that occurred in both the baseline and candidate models. The first column is the Ground truth, which is the labeled class on the image in the dataset. The second column is the Prediction, which is a class that either the baseline or candidate model predicted incorrectly.
Baseline, Candidate, and Differences
For each Ground Truth / Prediction pairing in a confusion matrix, LandingLens shows how each model performed and how the candidate model either improved or got worse. This information is displayed in the Baseline, Candidate, and Differences columns.
The Baseline and Candidate column depend on the project type:
- Object Detection and Classification: The number of times that the model made that prediction for the specific Ground Truth / Prediction pairing.
- Segmentation: The number of pixels for which the model made that prediction for the specific Ground Truth / Prediction pairing.
The Differences column shows if the candidate model improved or got worse, when compared to the baseline model. The following table describes the possible outcomes in the Differences column.
Outcome | Description |
---|---|
Green | The candidate performed better than the baseline. |
Red | The candidate performed worse than the baseline. |
Fixed | The baseline made errors, but the candidate did not. In other words, the candidate "fixed" all of the issues that the baseline had. |
New Error | The baseline did not make errors, but the candidate did. In other words, the candidate introduced a "new error type" that wasn't present in the baseline. |
Percentage | Both the baseline and candidate made errors. In this case, the Difference is calculated as: ((candidate - baseline) / baseline) * 100 |
-- | This is only applicable to the Correct category. Either the baseline or candidate made mistakes, but the other model did not. |
For example, this is how the Differences column looks in an Object Detection project:
View Images for a Confusion Matrix
Click View next to the Differences column of a confusion matrix (or simply click the row) to see the images that are included in this Ground Truth / Prediction pairing. Click the images to see a larger version of the images.
View Images - Overlays
Images have overlays that show the relevant predictions for the confusion matrix. If the model missed an object of interest, the overlay is white. Otherwise, the overlay colors correlate to the class colors.
The overlay formatting is different for each project type, as described in the following sections.
View Images - Object Detection
Each relevant prediction displays as an overlay. The number of predictions displays in the bottom right corner of the image. Some confusion matrices have additional overlays, as described in the following table.
Confusion Matrix | Overlay Description | Example |
---|---|---|
False Positive | The overlay includes the confidence score of the prediction. | |
False Negative | The overlay includes "Missed", because the model predicted that an object of interest was not present, but the model was incorrect. | |
Correct | The overlay includes the confidence score of the prediction. |
View Images - Segmentation
There are overlays over the regions that the model predicted incorrectly. It is important to note that the overlay does not show the full prediction, but only the part that was wrong for this this specific Ground Truth / Prediction pairing.
The overlay format is slightly different for each confusion matrix, as described in the following table.
Confusion Matrix | Overlay Description | Example |
---|---|---|
False Positive | There is a purple striped overlay over the regions that the model predicted incorrectly. | |
False Negative | There is a white striped overlay over the regions that the model "missed". | |
Correct | There is a purple striped overlay over the regions that the model predicted correctly. |
View Images - Classification
Because classes are assigned to an entire image in Classification projects, it doesn't make sense to show the predictions as an overlay. Therefore, LandingLens shows the images and lists the ground truth and predictions next to the images.
Compare All Images
Click All Images to see a visual comparison of all images. This shows three versions of each image in the evaluation dataset:
- Ground truth: The original image with the ground truth (labels) that you added.
- Baseline model: The predictions of the candidate model.
- Candidate model: The predictions of the candidate model. LandingLens highlights an image if the candidate model performed better or worse than the baseline for that specific image.
Click a set of images to see more information about those images.