Landing AI Docker
  • 07 May 2024
  • 12 Minutes to read
  • Dark
    Light
  • PDF

Landing AI Docker

  • Dark
    Light
  • PDF

Article Summary

You can deploy LandingLens models using Docker, which enables DevOps users to integrate the model inference programmatically.

The Docker approach is headless, which means that it has no user interface. This allows you to manage inference programmatically and scale deployments quickly.

After you deploy your model to an endpoint using our Docker approach, use the LandingLens Python library to run inference on the model.

Here’s a quick summary of the benefits that the Docker deployment offers:

  • Deployable in a private cloud environment or on any operating system
  • Total API control (including remote access) for inference
  • Can manage deployments programmatically
  • Not limited to a certain amount of inferences per minute
  • Can be used with or without a GPU
  • All communication with the Landing AI server is via an HTTPS connection
Note:
Deploying LandingLens models with Docker is intended for developers who are familiar with Docker and programming languages. If that's not you, that's okay! In that case, we recommend that you explore our other deployment options.
Note:
Landing AI Docker v2.8.8 and later does not require a deployment license. If you have an earlier version, please upgrade to the latest version.

Deploy the Model Online or Offline

With our Docker approach, you can deploy the model online or offline

If you deploy the model online, the model is downloaded directly from LandingLens, which requires an internet connection. 

If you deploy the model offline, you first manually download the model from LandingLens (which does require an internet connection), and then programmatically upload it to the container (which doesn't require an internet connection). 

Regardless of whether you run the model online or offline, inference is run locally.

Requirements

Ensure that you meet all the requirements before installing and running the Landing AI Deploy Docker image:

Required Applications

Install the following applications and systems in order to install and run the Landing AI Deploy Docker image:

System Support and Requirements

ItemRequirement
Supported Operating Systems

Linux, macOS, Windows

Memory
  • Minimum: 8 GB
  • Recommended: 16 GB or more
Supported Processors and Architecture

x86-64, ARM64


X86-64 includes, but is not limited to:

  • Intel Core i5 (2013 Haswell or later)
  • AMD Ryzen (2017 Zen or later)

ARM64 includes, but is not limited to:

  • AWS Graviton2
  • Apple silicon (M1, M2)1
Internet ConnectionRequired to download the Docker image and run inference
Note:
1. We have not yet tested the Apple M3 line of chips, but based on what we know about this architecture, we presume it is supported. 

NVIDIA Jetson Support

NVIDIA Jetson devices are supported but not required. If you choose to use an NVIDIA Jetson device:

  • Minimum: NVIDIA Jetson Orin Nano (or better/similar)
  • Recommended: NVIDIA Jetson Orin AGX (or better/similar)
Notes:
  • The Landing AI Deploy Docker image for devices on NVIDIA Jetpack 4.x does not support most Object Detection models. It only supports Object Detection models created with the RtmDet-[9M] Model Size in Custom Training.
  • Docker might require you to run commands or download libraries specific to your NVIDIA device and driver version. We recommend reading the NVIDIA documentation to understand requirements and compatibility.

GPU Support

Note:
  • For information about NVIDIA Jetson devices, go here.  
  • Docker might require you to run commands or download libraries specific to your NVIDIA device and driver version. We recommend reading the Docker and NVIDIA documentation to understand requirements and compatibility.

Install the Landing AI Image

To install the Landing AI Docker image, you will pull the image from Amazon Elastic Container Registry (Amazon ECR), a library of containerized applications. The image size is about 6GB. You can see all of the Landing AI "Deploy" images here.

You only need to install (pull) the image once. After you've downloaded the image, you're ready to launch an instance of it.

To pull the image, run this command:

docker pull public.ecr.aws/landing-ai/deploy:latest

Install the Image on Jetson Devices

We provide separate images for users on NVIDIA Jetson devices. To pull the image you need, run the command based on your JetPack version.

For NVIDIA Jetson devices with JetPack 4.x.x:

Go to the Landing AI repository in Amazon ECR and download the most recent version with the -jp4 suffix.

docker pull public.ecr.aws/landing-ai/deploy:VERSION-jp4

For NVIDIA Jetson devices with JetPack 5.x.x:

Go to the Landing AI repository in Amazon ECR and download the most recent version with the -jp5 suffix.

docker pull public.ecr.aws/landing-ai/deploy:VERSION-jp5

Docker Deployment Quick-Start Guide

To run inference on a model after installing the Landing AI Docker image, go through this checklist:

  1. Ensure you have a model in LandingLens that you want to run inference on. (New to LandingLens? Learn how to train a model here.)
  2. Activate the project that has the model you want to run inference with. You must activate a project in LandingLens first, regardless if you will run the model online or offline.
  3. If you want to run the model offline, download the model.
  4. Locate the Model ID for the model you want to run inference on.
  5. Locate your LandingLens API key.
  6. If you want to run the model online, run the run-model-id command.
  7. If you want to run the model offline, run the run-local-model command.
  8. Run inference with the LandingLens Python library.

Locate Your Model ID

The Model ID is included in the command to deploy the model via Docker. The Model ID tells the application which model to download from LandingLens.

To locate your Model ID, follow the instructions here.

Copy the Model ID

Locate Your LandingLens API Key

LandingLens uses API keys to authenticate access to the system. You will use an API Key in the command to download your computer vision model from LandingLens.

LandingLens API keys are managed on the API Keys page. To learn how to locate your LandingLens API key, go to Retrieve API Key.

"New" API Credentials Only Use the API Key

Deploy the Model Online

If you want to deploy the model online, run the run-model-id command to launch a container with the model.

By default, the inference endpoint port is 8000. In the code snippet below, the port flag (-p) sets the ports for the host and the container. The port on the left is the host port, and the port on the right is the container port.

To run the run-model-id command:

docker run -p 8000:8000 --rm public.ecr.aws/landing-ai/deploy:latest run-model-id --apikey API_KEY --model MODEL_ID

When you run the command, the Deploy application downloads your model from LandingLens. When it’s done, this message displays:

 [INF] Model loading status: [Ready]

Examples

Deploy the model when you have only an API Key (and not an API Key and API Secret): 

docker run -p 8000:8000 --rm run-model-id --apikey API_KEY --model MODEL_ID

Deploy the model when you have an API Key and an API Secret:

docker run -p 8000:8000 --rm run-model-id --apikey API_KEY --apisecret API_SECRET --model MODEL_ID

Deploy the model. When you send images for inference, save those images and the predictions to the corresponding project in LandingLens: 

docker run -p 8000:8000 --rm run-model-id --apikey API_KEY --model MODEL_ID --upload

Deploy the model. When you deploy a model, the name of the device displays on the Deploy page in LandingLens. Set a device name:

docker run -p 8000:8000 --rm run-model-id --apikey API_KEY --model MODEL_ID --name "my edge device" --upload

Flags

FlagDescription
-k, --apikey

Required. Set an API Key for your LandingLens organization. Can also set through the 'LANDING_API_KEY' environment variable.


Note:
Up until June 30, 2023, LandingLens required both an API Key and API Secret to be used to run inference. As of that date, LandingLens only generates API Keys (and not API Key and API Secret pairs). For more information, go to API Keys.
-s, --apisecret

If you’re using a “legacy” API Key and API Secret pair, set the API Secret. Can also set through the 'LANDING_API_SECRET' environment variable.


Note:
Up until June 30, 2023, LandingLens required both an API Key and API Secret to be used to run inference. As of that date, LandingLens only generates API Keys (and not API Key and API Secret pairs). For more information, go to API Keys.
-m, --modelRequired. Set the Model ID of the model you want to load. To locate your Model ID, go to Locate Your Model ID. Can also set through the 'MODEL_ID' environment variable.
-p, --port

The port number to use for communicating with the deployed model via API. Can also set through the 'PORT' environment variable.


Default: 8000.

-e, --external

Allow external hosts to access the API. Can also set through the 'ALLOW_EXTERNAL' environment variable.


If running in a container, the default is true. Otherwise, the default is false.

-u, --upload

When you send images for inference, save those images and the predictions to the corresponding project in LandingLens. Can also set through the 'UPLOAD_RESULTS' environment variable.


Default: false.

-g, --gpus

Select the GPUs you want to use to run inference. Include a space-separated list of the GPU indices. If you select multiple GPUs, the system will balance the load between the processors. Can also set through the 'GPUS' environment variable.


Default: use all GPUs available.

-n, --name

When you deploy a model, the name of the device displays on the Deploy page in LandingLens. Use this flag to set the device name. Can also set through the 'DEVICE_NAME' environment variable.


If unspecified the default is 'LE-{hostname}'.

--helpDisplay more information about the command.
--versionDisplay version information.

Deploy the Model Offline

If you want to deploy the model offline, run the run-local-model command. Running the model offline is helpful if your system doesn't have internet access or has limited bandwidth (which could make it difficult to download the image from Docker). Also, running the model offline reduces startup time, because you don't have to wait to download the file.

When running this command, include the mount volume (-v) flag from Docker so that it can access the file you downloaded.

Examples

Deploy the model:

docker run -v /local/dir:/dir/in/docker/container landing-ai/deploy:latest run-local-model --model /dir/in/docker/container/model-bundle.zip run-local-model --model models\model-bundle.zip

Deploy the model. When you send images for inference, save those images and the predictions to the corresponding project in LandingLens:

docker run -v /local/dir:/dir/in/docker/container landing-ai/deploy:latest run-local-model --model /dir/in/docker/container/model-bundle.zip --upload

Deploy the model. When you deploy a model, the name of the device displays on the Deploy page in LandingLens. Set a device name:

docker run -v /local/dir:/dir/in/docker/container landing-ai/deploy:latest run-local-model --model /dir/in/docker/container/model-bundle.zip --name "my edge device" --upload

Flags

FlagDescription
-m, --modelRequired. The model bundle to load. This will be a zip file. Can also set through the 'MODEL_PATH' environment variable.
-p, --port

The port number to use for communicating with the deployed model via API. Can also set through the 'PORT' environment variable.


Default: 8000.

-e, --external

Allow external hosts to access the API. Can also set through the 'ALLOW_EXTERNAL' environment variable.


If running in a container, the default is true. Otherwise, the default is false.

-u, --upload

When you send images for inference, save those images and the predictions to the corresponding project in LandingLens. Can also set through the 'UPLOAD_RESULTS' environment variable.


Default: false.

-g, --gpus

Select the GPUs you want to use to run inference. Include a space-separated list of the GPU indices. If you select multiple GPUs, the system will balance the load between the processors. Can also set through the 'GPUS' environment variable.


Default: use all GPUs available.

-n, --name

When you deploy a model, the name of the device displays on the Deploy page in LandingLens. Use this flag to set the device name. Can also set through the 'DEVICE_NAME' environment variable.


If unspecified the default is 'LE-{hostname}'.

--helpDisplay more information about the command.
--versionDisplay version information.

Run Inference with the LandingLens Python Library

Once the container with the model is ready, you can run inference with the model by using the LandingLens Python library

If your client code (in other words, your Python code) is on the same machine you’re running inference on, you can also use “localhost” instead of the IP address. 

The following example shows how to run inference within an instance hosted on 192.168.1.12, port 8000). For more examples, see our Python library.

from landingai.predict import EdgePredictor
import PIL.Image

predictor = EdgePredictor(host="192.168.1.12", port=8000)
img = PIL.Image.open("/Users/username/Downloads/test.png")
predictions = predictor.predict(img)

Power Inference with GPUs

By default, the Docker deployment doesn’t use any GPUs in your system. However, you can configure the deployment to use some or all of your GPUs. See the list of supported GPUs here.

In this section, we’ll give you the basics about how to enable GPUs to power inference. However, Docker might require you to run commands or download libraries specific to your NVIDIA device and driver version. We recommend reading the Docker and NVIDIA documentation to understand requirements and compatibility.

To use all GPUs in your system to power inference, include the --gpus all flag as part of the docker command. For example:

docker run --gpus all -p 8000:8000 --rm -e public.ecr.aws/landing-ai/deploy:latest run-model-id --apikey API_KEY --model MODEL_ID

To use only specific GPUs in your system to power inference, include the --gpus flag and enter the indices of the GPUs you want to use as a comma-separated list. For example, in the snippet below, the deployment will use the GPUs with index 0 and index 2:

docker run --rm --gpus '"device=0,2"' -p 8000:8000 -e public.ecr.aws/landing-ai/deploy:latest run-model-id --apikey API_KEY --model MODEL_ID

Deploy with Kubernetes

You can deploy the Landing AI Deploy image with Kubernetes and monitor the system with our status APIs.

Here is an example Kubernetes Pod YAML deployment file for the Landing AI Deploy image:

apiVersion: v1
kind: Pod
metadata:
  name: landingai-deploy
  namespace: default
  labels:
    app.kubernetes.io/name: landingai-deploy
    app: landingai-deploy
spec:
  containers:
  - name: landingai-deploy
    image: public.ecr.aws/landing-ai/deploy:latest
    args: ["run-model-id", "-m", "YOUR_MODEL_ID"]
    imagePullPolicy: IfNotPresent
    env:
      - name: LANDING_API_KEY
        value: "YOUR_API_KEY"
    ports:
      - containerPort: 8000
    resources:
      requests:
        memory: 2G
        cpu: 2
      limits:
        memory: 4G
        cpu: 5
    startupProbe:
      httpGet:
        port: 8000
        path: /status
    livenessProbe:
      httpGet:
        port: 8000
        path: /live
      initialDelaySeconds: 5
      periodSeconds: 5
    readinessProbe:
      httpGet:
        port: 8000
        path: /ready
      initialDelaySeconds: 5
      periodSeconds: 5

Web APIs in Swagger

To access the web APIs for the Landing AI Docker deployment solution in Swagger, go to http://localhost:[port], where [port] is the port you’re using to communicate with the Dockerized application.

Use the /status, /ready, and /live endpoints to monitor the status when using Kubernetes or another orchestration system.

Use the /images endpoint to run inference.

You can use these web APIs to programmatically start or monitor inference.

APIs in Swagger

/status

The /status endpoint always returns 200.

/ready

The /ready endpoint indicates if the model is ready to receive inference calls. It returns 200 if the model is loaded. Otherwise, it returns 503, which indicates that either the model hasn't loaded yet or the model failed to load.

/live

The /live endpoint returns 200 if the model is loading or loaded. Otherwise, it returns 503, which indicates that the container is dead and should be killed by the orchestrator (such as Kubernetes).

/images

Use the /images endpoint to run inference. Results are returned in JSON.

Note:
The /api/v1/images endpoint is provided for compatibility with an older version of the Landing AI Docker image. It is supported, but the JSON results are formatted differently.

Troubleshooting: User Declined Directory Sharing

Scenario

You receive the following error message when running the run-local-model command:

Error response from daemon: user declined directory sharing [directory path].

Solution

This error indicates that the Landing AI Deployment application does not have permission to access the directory you included in the command. Add the directory to the File Sharing list in Docker Desktop. The way to do this depends on your operating system. See the instructions for your system: Mac, Windows, and Linux.


Was this article helpful?