Object Model Reference

Description and control of your inference service is accomplished via two primary objects: Packaged Model and Deployment. Additionally, when referencing a Package Model via an external hosting method such as S3 bucket, or huggingface.co registry, you may need to configure a Registry to enable that access.

Input Attributes

name The name of the deployment to enable access to it via the REST interface or CLI. This is the name used in the associated KServe inference service that will be created.
namespace The Kubernetes namespace into which the service is deployed. It must already exist.
model The name (or id) of the packaged model to be deployed.
security Encapsulates the security option (authenticationRequired) for the deployed service.
autoScaling Controls the scaling limits minReplicas/maxReplicas, metric to control scaling, and the target value.
canaryTrafficPercent The percentage of traffic to route to this particular model version. The default is 100.
goalStatus Specifies the intended status to be achieved by the deployment. The default is Ready.
- Ready
  The inference service will be deployed to enable inference calls.
- Paused
  The inference service will be stopped and no longer accept calls.
environment Environment variables to be provided to the container image when started.
arguments Arguments to be passed to the container image when started. These are in addition to any configured on the packaged model.

Managed Attributes

id A unique identifier to identify this service.
status Summary status of the deployed service.
- Deploying
  Service configuration is in progress.
- Failed
  The service configuration failed.
- Ready
  The service has been successfully configured and is serving.
- Updating
  A new service revision is being rolled out.
- UpdateFailed
  The current service revision failed to roll out due to an error. The prior version is still serving requests.
- Deleting
  The deployed service is being removed.
- Paused
  The deployed service has been stopped by the user or an external action.
- Unknown
  Unable to determine the status.
- Canceled
  The deployed service has been canceled.
state State details of the current service configuration requested. See the DeploymentStateDetails component for details.
secondaryState State details of a prior service configuration until the currently requested configuration has been fully rolled out.

Input Attributes

name The name of the model.
description A text description of the model.
modelFormat Model format for downloaded models (e.g. from S3, http, etc.).
- custom
  The packaged model is provided in a container image.
- bento-archive
  The packaged model is a bento archive file (.bento). It will be downloaded, expanded, and then will be served using the bentolm serve command in a provided bentoml serving container.
- openllm
  The packaged model will be served using the openllm serve command in a provided openllm serving container.
- nim
  The packaged model will be served using the specified NVIDIA NIM microservices container image.
registry The name or id of a registry object. If the model data is not provided via a container image, this must be specified.
url Reference to the Bento or model to be served.
- openllm
  An OpenLLM model name from huggingface.co dynamically loaded and executed with a VLLM backend.
- s3
  An OpenLLM model path which will be dynamically downloaded from an associated S3 registry bucket.
- ngc
  An NVIDIA NGC model will be dynamically downloaded from the associated ngc registry bucket and executed with the specified NVIDIA NIM microservices container image.
resources The resource requirements for running the service (requests/limits) for cpu/memory/gpu.
image The containerized bento where the inference service is to be deployed.
environment Environment variables to be provided to the container image when started. See Packaged Model Environment Variables for a list of default options.
arguments Arguments to be passed to the container image when started.

Managed Attributes

id A unique identifier to identify this particular model version.
version An automatically incrementing integer version of the model as you make changes.

Input Attributes

name The name of service to enable access to it via the REST interface or CLI.
description A text description of the service.
type The type of this model registry.
- s3
  Configuration to enable access to an S3 bucket.
- openllm
  Configuration to enable direct download of openllm models from huggingface.co. Provide your access token in the secretKey field.
- ngc
  Configuration to enable direct download from the NGC: AI Development Catalog.
endpointUrl The registry endpoint (host).
- s3
  The S3 registry endpoint for the associated S3 region. Required.
- openllm
  The huggingface.co-compatible endpoint (default https://huggingface.co).
- ngc
  The NVIDIA NGC-compatible api endpoint (default https://api.ngc.nvidia.com).
bucket The bucket or organization name, depending on which of the following values is selected as the model registry type.
- s3
  The required S3 bucket name.
- ngc
  The required NGC org name.
accessKey The access key, username or team name for the registry.
- s3
  The access key/username.
- ngc
  The optional NGC team name.
secretKey The secret key is the password, secret key, or access token for the registry.
- s3
  The secretKey provides a secret key for the S3 bucket.
- openllm
  The secretKey is the access token for huggingface.co and is supplied to the launched container via the HF_TOKEN environment variable.
- ngc
  The NVIDIA NGC apikey.
insecureHttps For https endpoints, the server certificate will be accepted without validation.

Managed Attributes

id A unique identifier to identify this service.

Attributes

endpoint The endpoint uri used to access the inference service.
nativeAppName The name of the Kubernetes application for the specific service version. Use this name to match the app value in Grafana/Prometheus to obtain logs and metrics for this deployed service version.
status The status of a particular inference service revision.
- Deploying
  The service configuration is in progress.
- Failed
  The service configuration failed.
- Ready
  The service has been successfully configured and is serving.
- Updating
  A new service revision is being rolled out.
- UpdateFailed
  The current service revision failed to rollout due to an error. The prior version is still serving requests.
- Deleting
  The service is being removed.
- Paused
  The service has been stopped by the user or an external action.
- Unknown
  Unable to determined the service’s status.
- Canceled
  The specified model version of the deployment was canceled by the user.
trafficPercentage Percent of traffic being processed by this service/model version.
failureInfo A list of any failures associated with the deployment of this service/model version.
modelId The id of the deployed packaged model associated with this state.

Article Summarization

Object Model Reference

Deployment

Input Attributes

Managed Attributes

PackagedModel

Input Attributes

Managed Attributes

Registry

Input Attributes

Managed Attributes

DeploymentStateDetails

Attributes