Compare Models
  • 31 Jan 2024
  • 10 Minutes to read
  • Dark
    Light
  • PDF

Compare Models

  • Dark
    Light
  • PDF

Article Summary

Beginning on Jan 30, 2024, you can now quickly and easily compare model performance across multiple datasets in a single project.

All trained models for a project now appear in a table on a new tab called Models. By default, you can see how each model performed on its Train, Dev, and Test sets. You can also add existing dataset snapshots—or filtered versions of them—to the table. This matrix empowers you to quickly compare model performance on a variety of datasets. From here, you can also deploy models.

Comparing model performance on the Models tab isn't a required step in LandingLens, but it's a useful technique for identifying your most accurate models.

Compare Model Performance Across Different Datasets on the Models Page

Note:
Due to the unique nature of Visual Prompting, the Models tab is not available for Visual Prompting projects.  

How do I use the Models table to see what model is the best for my project?

You can use the Models table to quickly evaluate model performance across different datasets. You can also see how the same model—but with different confidence scores—performs on the same datasets.

There is no one-size-fits-all solution, but quickly comparing model performance can help you identify 1) what model works best for your use case and 2) what models might need better images or labels.

Here are some considerations:

  • If two models have the same confidence threshold but different scores on the same datasets, view the predictions for the model with the lower score. Are the labels correct? Do you need more images of a specific class?
  • If a model has a higher score on a dataset that is most like your real-world scenario, that model might be the best one for your use case.

Models Table Overview

Here's a quick orientation to the Models table:

Key Parts of the Models Table
#ItemDescription
1ModelThe model name and training method (customized or default).
2Evaluation setsThese columns consist of your evaluation sets, which are sets of images used to evaluate model performance.

The model's Train, Dev, and Test sets display by default. You can add more datasets and run the models on those sets.

Shows the F1 score (for Object Detection and Classification projects) and IoU score (for Segmentation projects).
3Confidence ThresholdThe Confidence Threshold for the model.

The Confidence Score indicates how confident the model is that its prediction is correct. The Confidence Threshold is the minimum Confidence Score the model must assign to a prediction in order for it to believe that its prediction is correct.

When LandingLens creates a model, it selects the Confidence Threshold with the best F1 score for all labeled data.

4DeploymentShows if the models have been deployed via Cloud Deployment.
5More ActionsFavorite, deploy, and delete models. Can also copy the Model ID.

Model Information

The Model column displays the model name and its training method:

Models

Click the cell to see the model's Training Information and Performance Report.

A model can have multiple rows. For example, if you deploy a model and select a confidence score that is not the default one, then two rows for the model display in the table. The first row has the default confidence threshold, and the second has the custom confidence threshold.

For example, in the screenshot below, the default confidence threshold is 0.71, and the custom confidence threshold is 0.99.

Compare How the Same Model Performs with Different Confidence Thresholds

Training Information

Clicking a model on the Models page opens the Training Information tab. This tab shows basic information about the model and the dataset it was trained on. 

Highlights include:

Training Information 

Training Information, Continued
#ItemDescription
1Loss ChartDuring model training, LandingLens calculates the error between the ground truth and the predictions, which is called loss. This chart shows the loss over time (in seconds). If the model improves during the training process, the line goes toward 0 over time (however, loss will never be 0).

The loss is calculated on the Train set, because this is the set the model trains on.
2Trained FromThe name of the dataset snapshot that the model was trained on.
3SplitShows how many images are in each split.
4View ImagesClick View Images to see the dataset snapshot that the model was trained on.
5Hyperparameter
Transform
Augmentation
The configurations used to train the model.
 
For Fast Training (default configuration), this includes Hyperparameters, which are the number of epochs and model size.
 
For Custom Training (customized configuration), this also includes any Transforms and Augmentations. For more information about these configurations, go to Custom Training.

Performance Report

Clicking an evaluation set score on the Models page opens the Performance Report tab (you can also click a model on the Models page and then select this tab). 

This report shows how the model performed on the selected evaluation set (and not for the entire dataset). You can select different sets from the Evaluation Set drop-down menu.

The bottom part of the report compares the ground truth (your labels) to the model's predictions. You can filter by prediction type (False Positive, False Negative, Mis-Classification, and Correct Predictions) and sort by model performance.

Performance Report for the Selected Evaluation Set.

The Performance Report and Build Tab May Have Different Results

The results in the Performance Report might be different than the results in the Build tab. This is because the Performance Report is based on a specific version of a dataset—the images and labels never change. 

However, the results on the Build tab are “live” and might change based on any updates to images or labels. 

For example, let’s say that you train a model and create an evaluation set based on the dataset currently in the Build tab. You then add images and labels. This leads to the performance and results being different, as shown in the screenshots below.

The Performance Report Is Based on a Static Dataset
The Performance in the Build Tab Changes Based on Changes to Images and Labels

Adjust Threshold

To see how the model performs on the evaluation set with a different Confidence Threshold, click Adjust Threshold and select a different score.

Adjust the Confidence Threshold

Overall Score for the Evaluation Set

The Performance Report includes a score for the evaluation set (and not for the entire dataset). The type of score depends on the project type:

Object Detection and Classification: F1 Score

The Performance Report includes the F1 score for Object Detection and Classification projects.

The F1 score combines precision and recall into a single score, creating a unified measure that assesses the model’s effectiveness in minimizing false positives and false negatives. A higher F1 score indicates the model is balancing the two factors well. LandingLens uses micro-averaging to calculate the F1 score.

Object Detection and Classification Projects Show the F1 Score

Segmentation: Intersection Over Union (IoU)

The Performance Report includes the Intersection over Union (IoU) score for Segmentation projects.

Intersection over Union (IoU) is used to measure the accuracy of the model by measuring the overlap between the predicted and actual masks in an image. A higher IoU indicates better agreement between the ground truth and predicted mask. LandingLens does not include the implicit background and micro-averaging in the calculation of the IoU.

Segmentation Projects Show the Intersection Over Union (IoU) Score

Evaluation Sets

The Models table shows how each model performs on different sets of images. These image sets are called evaluation sets, because they're used to evaluate model performance. 

The default evaluation sets are the Train, Dev, and Test splits for the models. You can add evaluation sets.

Click a cell to see the Performance Report for that evaluation set.

Evaluation Sets

Evaluation Set Scores

A good indication that a model performs well is that its Train and Dev set scores are high and similar to each other.

The score for the Train set might be higher than the scores for the other splits, because these are the images that the model trains on. It is normal for the Train set score to be less than 100% because models usually make mistakes during the training process. 

In fact, a score of 100% on the Train might indicate overfitting, especially if the Dev set score is much lower. If the two scores are very different, try adding more images to these sets.

Similarly, the score for the Test set might be lower than the scores for the other splits, because the model is not trained on these images.

The following image and table explain the evaluation set scores.

Evaluation Set Scores
#ItemDescription
1PercentageShows the F1 score (for Object Detection and Classification projects) and IoU score (for Segmentation projects). Learn more about these scores in Overall Score for the Evaluation Set.
2--The subset doesn't have any images.

If you don't assign splits to a dataset before you train a model, LandingLens automatically assigns images to the Train and Dev splits, but not the Test split. Therefore, you will see "--" for the Test split in that situation.
3BlankThe model hasn't run on the set yet. To run the model, hover over the cell and click Evaluate. For more information, go here.

Run the Model on a "Blank" Set

If an evaluation set cell is blank, hover over the cell and click Evaluate. The model runs inference on the images in that evaluation set and displays the score.

Run the Model on a "Blank" Evaluation Set

Add Evaluation Sets and Run Models on Them

Note:
Evaluation sets cannot be deleted.

By default, each model's performance score for its Train, Dev, and Test set scores displays in the Models table. You can add more datasets. These are called evaluation sets, because they're used to evaluate model performance.

To add an evaluation set: 

  1. Open the project to the Models tab.
  2. Click Add Evaluation Set. If you've already dismissed this message, click + in the table header.
    Add an Evaluation Set
    Add an Evaluation Set (If You've Already Dismissed the Message)
  3. Select a snapshot.
  4. If you want to run the model only on one of the splits, click that split.
  5. Click Add to the Table.
    Select a Snapshot to Use as an Evaluation Set
  6. LandingLens adds a column for that dataset. To run a model on the dataset, hover over the cell and click Evaluate. (To prevent slowing down the system, LandingLens doesn't automatically run each model on the evaluation sets. Click Evaluate for each model / evaluation set combination that you want to run.)
    Run the Model on a Specific Evaluation Set
  7. The model runs inference on the images in that evaluation set and displays the F1 or IoU score.
    The Score Displays
  8. Click the percentage to open the Performance Report.
    View the Performance Report for the Evaluation Set

Archive Evaluation Sets

You can archive evaluation sets. This removes the evaluation set column from the Models table. You can later add the evaluation set to the table again.

To archive an evaluation set:

  1. Open the project to the Models tab.
  2. Hover of the area to the left of the evaluation set name. 
  3. Click the Archive icon that appears.
    Hover to See the Archive Icon and Click It
  4. Click Yes on the pop-up window to confirm the action.

Confidence Threshold 

The Confidence Threshold column shows the Confidence Threshold for that model. 

The Confidence Score indicates how confident the model is that its prediction is correct. The Confidence Threshold is the minimum Confidence Score the model must assign to a prediction in order for it to believe that its prediction is correct.

When LandingLens creates a model, it selects the Confidence Threshold with the best F1 score for all labeled data.

Confidence Threshold

Cloud Deployment

The Deployment column shows if the model has been deployed via Cloud Deployment. A Cloud icon displays for each deployment (LandingLens cycles through seven colors for the Cloud icon). Click an icon to see the deployment details for the model.

If the model hasn't been deployed via Cloud Deployment, the column is blank.

Cloud Deployment Icons
Note:
Icons don't display for LandingEdge or Docker deployments.

More Actions

In the last column (Final Actions column), you can:

More Actions for Models

Favorite Models

To mark a model as a "favorite", click the Favorite (star) icon. This colors-in the star, so that you can easily see which models in the table you've marked as favorites. You can favorite multiple models. To unfavorite a model, click the Favorite icon again.

Click the Star to Favorite and Unfavorite Models

To filter by favorites, select the Only show favorite models checkbox.

Filter by Favorites

Deploy Models

You can deploy a model from several pages in LandingLens, including the Models tab. To deploy a model from this tab, click the Rocket icon in the Deploy Models column. Follow the on-screen prompts to finish the deployment. To learn more, go to Deployment Options.

Copy Model ID

If you're deploying a model via Docker, the Model ID is included in the deployment command. The Model ID tells the application which model to download from LandingLens. To locate the Model ID on the Models page, click the Actions (...) icon and select Copy Model ID.

Copy the Model ID
Note:
You can also copy the Model ID from the Deploy page.

Delete Models

You can delete a model from the table. This action removes the model only from the table; you can still deploy it and access it from other areas in LandingLens, like Dataset Snapshots.

To delete a model, click the Actions (...) icon and select Delete. A model can't be re-added to this table after it's been deleted.

Delete a Model

Was this article helpful?