- 15 Mar 2023
- 6 Minutes to read
Datasets and Splits
- Updated on 15 Mar 2023
- 6 Minutes to read
Datasets are collections of information needed to train your Model, like images, labels, and any pre-processing augmentation you may have added to your images.
LandingLens will automatically split your images into different datasets when you train a Model. Each dataset serves a different purpose:
- Train Set: The Model uses this dataset to train the Model. Furthermore, this dataset is used to learn the parameters of the Model.
- Dev Set: The Model refers to this dataset to validate its Predictions. (This dataset is also called a "development set" and "validation set").
- Test Set: The Model refers to this dataset to evaluate its performance. This is done after the training process.
Why Does LandingLens Split Data?
In the Machine Learning world, you can overtrain your Model. This concept is similar to overstudying for, say, a test. For example, what if Jane uses flashcards to study? If Jane studies too much this way, she'll simply memorize the answers to the questions without understanding why these are the answers or how to arrive at those answers. Models can have similar results if you overtrain them. This concept is called Overfitting. Here's how it works:
Suppose you have a Model trained to detect cats. What if the cats in all the images used for training all have collars on?
If you overtrain your Model, the Model may learn that an important feature of cats is their collars. In this case, if you deploy your Model and show it images of wild cats, the Model may not detect these cats because they aren't wearing collars.
Detecting if you are Overfitting your Model can be tricky, which is why LandingLens offers the solution of splitting your images into different datasets during Model training.
"Choose dev and test sets to reflect data you expect to get in the future and want to do well on." —Andrew Ng
LandingLens automatically splits your images and apply them to different datasets for you during Model Training. However, you can manually split images if you'd like. This section describes the different options.
|Location||Are the images already uploaded?||Can you apply splits to multiple images at once?|
Set Splits in Image View
After you've uploaded images to your Project, you can manually split images into different datasets, one at a time, when viewing an image. To do this:
- Open the Project.
- Click the image you want to add to a specific dataset.
- Select the dataset you want to add the image to from the Splitdrop-down menu.
- LandingLens automatically saves this setting.
Set Splits for Multiple Images
After you've uploaded your images to your Project, you can manually split multiple images into different datasets. To do this:
- Open the Project.
- Select the images you want to add to a specific dataset.
- Click Options in the action bar near the bottom of the screen and select Set Split. (You may need to scroll to see this option.)
- Select the type of dataset you want to assign the images to.
- Click Apply. The images are assigned to the selected dataset.
- To deselect the images, click Deselect All.
Splits on Upload to Segmentation Projects
If you're using a Segmentation Project, you can assign splits at upload. To do this:
- Open the Segmentation Project you want to upload images to or create a new Project.
- For a Project that already has images, click the Upload icon. For a Project that doesn't have images yet, click Click Hereat the bottom of the page.
- Click Assign Split.
- Select the type of dataset you want to assign images to.
- Click Apply.
- The selected Split displays. Upload the images you want to assign to the selected dataset.
LandingLens offers an export dataset feature. This feature is like having insurance for your datasets—it allows you to have a backup of your datasets in case you need to refer to them later. Furthermore, exporting datasets allows you to version them if you want to update them.
Exported datasets can be viewed and downloaded in LandingLens.
When to Export Datasets
Exporting a dataset allows you to keep a copy of that dataset, saving you time if you need to reference it in the future. Here are a couple of examples:
- You have a deployed Model with great performance, and you want to add more Classes to it. You can export your dataset if the Model's performance declines after those iterations.
- You trained a Model with mediocre performance, and you want to relabel the images to see if the Model can achieve better results. You can export your dataset in case the Model's performance declines after those iterations.
When you export a dataset, you can download Pascal VOC files of your images. This allows you to re-upload those images to a new Project. Referring to the examples above, if your Model's performance declines after iterations, you can upload the Pascal VOC files of an exported dataset to work off the better Model.
- Pascal VOC files are only available for Object Detection and Segmentation Models.
- You can export datasets by Class for Classification Models. The Class names will display as filters on the Data Browser, allowing you to find images faster.
- If you upload Pascal VOC files that include "Nothing to Label" images, these images will display as "Unlabeled". This means you will need to relabel them as "Nothing to Label".
- Open the project.
- Select the images that you want to include in your dataset.
- Click Options in the action bar near the bottom of the screen and select Export Dataset. (You may need to scroll to see this option.)
- A preview of your data displays. Enter a short, descriptive name for your dataset in the Name Your Exported Dataset field.
- If you want to download Pascal VOC files of your images, select the checkbox.
- Click Export Media. The dataset may take a few minutes to export.
View Exported Datasets
Datasets may take a few minutes to export. To view an exported dataset:
- Open the Project.
- Click the Actions button (ellipses) and select View Exported Dataset.
- To download the Pascal VOC archive of the dataset, click the vertical ellipses button and select Download Pascal Voc file.