Object Model Reference
Description and control of your inference service is accomplished via two primary objects: Packaged Model and Deployment.
Additionally, when referencing a Package Model via an external hosting method such as S3 bucket, or huggingface.co registry, you may need to configure a Registry to enable that access.
Deployment
The Deployment object controls when a Packaged Model is deployed.
The Deployment object has the following attributes:
Input Attributes
- name The name of the deployment to enable access to it via the REST interface or CLI. This is the name used in the associated KServe inference service that will be created.
- namespace The Kubernetes namespace into which the service is deployed. It must already exist.
- model The name (or
id
) of the packaged model to be deployed. - security Encapsulates the security option (authenticationRequired) for the deployed service.
- autoScaling Controls the scaling limits minReplicas/maxReplicas, metric to control scaling, and the target value.
- canaryTrafficPercent The percentage of traffic to route to this particular model version. The default is
100
. - goalStatus Specifies the intended status to be achieved by the deployment. The default is
Ready
. - environment Environment variables to be provided to the container image when started.
- arguments Arguments to be passed to the container image when started. These are in addition to any configured on the packaged model.
Managed Attributes
- id A unique identifier to identify this service.
- status Summary status of the deployed service.
- state State details of the current service configuration requested. See the DeploymentStateDetails component for details.
- secondaryState State details of a prior service configuration until the currently requested configuration has been fully rolled out.
PackagedModel
The Packaged Model object identifies the model and code that make up your inference service.
The code may be provided via a container image, or via an external hosting method (S3 or huggingface.co registry).
The Packaged Model object has the following attributes:
Input Attributes
- name The name of the model.
- description A text description of the model.
- modelFormat Model format for downloaded models (e.g. from
S3
,http
, etc.). - registry The name or
id
of a registry object. If the model data is not provided via a container image, this must be specified. - url Reference to the Bento or model to be served.
- resources The resource requirements for running the service (requests/limits) for cpu/memory/gpu.
- image The containerized bento where the inference service is to be deployed.
- environment Environment variables to be provided to the container image when started. See Packaged Model Environment Variables for a list of default options.
- arguments Arguments to be passed to the container image when started.
Managed Attributes
- id A unique identifier to identify this particular model version.
- version An automatically incrementing integer version of the model as you make changes.
Registry
The Registry object provides the metadata that describes how to download a Packaged Model for deployment.
Input Attributes
- name The name of service to enable access to it via the REST interface or CLI.
- description A text description of the service.
- type The type of this model registry.
- endpointUrl The registry endpoint (host).
- bucket The bucket or organization name, depending on which of the following values is selected as the model registry type.
- accessKey The access key, username or team name for the registry.
- secretKey The secret key is the password, secret key, or access token for the registry.
- insecureHttps For
https
endpoints, the server certificate will be accepted without validation.
Managed Attributes
- id A unique identifier to identify this service.
DeploymentStateDetails
The state details of an inference service Deployment are described with the following attributes:
Attributes
- endpoint The endpoint
uri
used to access the inference service. - nativeAppName The name of the Kubernetes application for the specific service version. Use this name to match the app value in Grafana/Prometheus to obtain logs and metrics for this deployed service version.
- status The status of a particular inference service revision.
- trafficPercentage Percent of traffic being processed by this service/model version.
- failureInfo A list of any failures associated with the deployment of this service/model version.
- modelId The
id
of the deployed packaged model associated with this state.