Datasets and Splits
  • 25 Apr 2023
  • 6 Minutes to read
  • Dark
    Light
  • PDF

Datasets and Splits

  • Dark
    Light
  • PDF

Article Summary

Datasets are collections of information needed to train your Model, like images, labels, and any pre-processing augmentation you may have added to your images. 

Note:
The information in this section is not applicable to Visual Prompting. For more information, go to Visual Prompting.

Splits

LandingLens will automatically split your images into different datasets when you train a Model. Each dataset serves a different purpose:

  • Train Set: The Model uses this dataset to train the Model. Furthermore, this dataset is used to learn the parameters of the Model.
  • Dev Set: The Model refers to this dataset to validate its Predictions. (This dataset is also called a "development set" and "validation set").
  • Test Set: The Model refers to this dataset to evaluate its performance. This is done after the training process. 
Images Are Split into Different Sets for Model Training 

Why Does LandingLens Split Data?

In the Machine Learning world, you can overtrain your Model. This concept is similar to overstudying for, say, a test. For example, what if Jane uses flashcards to study? If Jane studies too much this way, she'll simply memorize the answers to the questions without understanding why these are the answers or how to arrive at those answers. Models can have similar results if you overtrain them. This concept is called Overfitting. Here's how it works:

Suppose you have a Model trained to detect cats. What if the cats in all the images used for training all have collars on?

Cat with a Collar

If you overtrain your Model, the Model may learn that an important feature of cats is their collars. In this case, if you deploy your Model and show it images of wild cats, the Model may not detect these cats because they aren't wearing collars.

Detecting if you are Overfitting your Model can be tricky, which is why LandingLens offers the solution of splitting your images into different datasets during Model training.

Split Images

"Choose dev and test sets to reflect data you expect to get in the future and want to do well on." —Andrew Ng 

LandingLens automatically splits your images and apply them to different datasets for you during Model Training. However, you can manually split images if you'd like. This section describes the different options.

LocationAre the images already uploaded?Can you apply splits to multiple images at once?
Image ViewYesNo
Data BrowserYesYes
Upload PageNoYes

Set Splits in Image View 

After you've uploaded images to your Project, you can manually split images into different datasets, one at a time, when viewing an image. To do this:

  1. Open the Project.
  2. Click the image you want to add to a specific dataset.
  3. Select the dataset you want to add the image to from the Splitdrop-down menu.
    Select a Split
  4. LandingLens automatically saves this setting.

Set Splits for Multiple Images 

After you've uploaded your images to your Project, you can manually split multiple images into different datasets. To do this:

  1. Open the Project. 
  2. Select the images you want to add to a specific dataset.
  3. Click Options in the action bar near the bottom of the screen and select Set Split. (You may need to scroll to see this option.) 
    Set Split
  4. Select the type of dataset you want to assign the images to.
  5. Click Apply. The images are assigned to the selected dataset.
    Apply a Split to the Selected Images
  6. To deselect the images, click Deselect All.

Splits on Upload to Segmentation Projects

If you're using a Segmentation Project, you can assign splits at upload. To do this:

  1. Open the Segmentation Project you want to upload images to or create a new Project.
  2. For a Project that already has images, click the Upload icon. For a Project that doesn't have images yet, click Click Hereat the bottom of the page.
    Upload
  3. Click Assign Split.
    Assign Split
  4. Select the type of dataset you want to assign images to.
  5. Click Apply.
    Apply a Split
  6. The selected Split displays. Upload the images you want to assign to the selected dataset.
    View the Selected Split

Export Datasets

Note:
The Export Dataset feature is only available to legacy "classic" workflow users.

LandingLens offers an export dataset feature. This feature is like having insurance for your datasets—it allows you to have a backup of your datasets in case you need to refer to them later. Furthermore, exporting datasets allows you to version them if you want to update them. 

Exported datasets can be viewed and downloaded in LandingLens.

When to Export Datasets

Note:
The Export Dataset feature is only available to legacy "classic" workflow users.

Exporting a dataset allows you to keep a copy of that dataset, saving you time if you need to reference it in the future. Here are a couple of examples:

  • You have a deployed Model with great performance, and you want to add more Classes to it. You can export your dataset if the Model's performance declines after those iterations.
  • You trained a Model with mediocre performance, and you want to relabel the images to see if the Model can achieve better results. You can export your dataset in case the Model's performance declines after those iterations.

When you export a dataset, you can download Pascal VOC files of your images. This allows you to re-upload those images to a new Project. Referring to the examples above, if your Model's performance declines after iterations, you can upload the Pascal VOC files of an exported dataset to work off the better Model.

Notes:
  • Pascal VOC files are only available for Object Detection and Segmentation Models. 
  • You can export datasets by Class for Classification Models. The Class names will display as filters on the Data Browser, allowing you to find images faster.
  • If you upload Pascal VOC files that include "Nothing to Label" images, these images will display as "Unlabeled". This means you will need to relabel them as "Nothing to Label".

Export Datasets

Note:
The Export Dataset feature is only available to legacy "classic" workflow users. 
  1. Open the project.
  2. Select the images that you want to include in your dataset.
  3. Click Options in the action bar near the bottom of the screen and select Export Dataset. (You may need to scroll to see this option.)
    Export Selected Images
  4. A preview of your data displays. Enter a short, descriptive name for your dataset in the Name Your Exported Dataset field.
  5. If you want to download Pascal VOC files of your images, select the checkbox.
  6. Click Export Media. The dataset may take a few minutes to export.
    Configure Settings and Export

View Exported Datasets

Note:
The Export Dataset and View Exported Dataset features are only available to legacy "classic" workflow users. 

Datasets may take a few minutes to export. To view an exported dataset:

  1. Open the Project.
  2. Click the Actions button (ellipses) and select View Exported Dataset.
    View Exported Dataset
  3. To download the Pascal VOC archive of the dataset, click the vertical ellipses button and select Download Pascal Voc file.
    View and Download Exported Dataset

Was this article helpful?