You are viewing documentation for Kubeflow 1.4

This is a static snapshot from the time of the Kubeflow 1.4 release.
For up-to-date information, see the latest version.

PyTorchJob

Reference documentation for PyTorchJob

Packages:

kubeflow.org

Package v1 is the v1 version of the API.

Resource Types:

PyTorchJob

Represents a PyTorchJob resource.

Field Description
apiVersion
string
kubeflow.org/v1
kind
string
PyTorchJob
metadata
Kubernetes meta/v1.ObjectMeta

Standard Kubernetes object’s metadata.

Refer to the Kubernetes API documentation for the fields of the metadata field.
spec
PyTorchJobSpec

Specification of the desired state of the PyTorchJob.



activeDeadlineSeconds
int64
(Optional)

Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.

backoffLimit
int32
(Optional)

Number of retries before marking this job as failed.

cleanPodPolicy
common/v1.CleanPodPolicy

Defines the policy for cleaning up pods after the PyTorchJob completes. Defaults to Running.

ttlSecondsAfterFinished
int32

Defines the TTL for cleaning up finished PyTorchJobs (temporary before Kubernetes adds the cleanup controller). It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Defaults to infinite.

pytorchReplicaSpecs
map[github.com/kubeflow/pytorch-operator/pkg/apis/pytorch/v1.PyTorchReplicaType]*github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec

A map of PyTorchReplicaType (type) to ReplicaSpec (value). Specifies the PyTorch cluster configuration. For example, { “Master”: PyTorchReplicaSpec, “Worker”: PyTorchReplicaSpec, }

status
common/v1.JobStatus

Most recently observed status of the PyTorchJob. Read-only (modified by the system).

PyTorchJobSpec

(Appears on: PyTorchJob)

PyTorchJobSpec is a desired state description of the PyTorchJob.

Field Description
activeDeadlineSeconds
int64
(Optional)

Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.

backoffLimit
int32
(Optional)

Number of retries before marking this job as failed.

cleanPodPolicy
common/v1.CleanPodPolicy

Defines the policy for cleaning up pods after the PyTorchJob completes. Defaults to Running.

ttlSecondsAfterFinished
int32

Defines the TTL for cleaning up finished PyTorchJobs (temporary before Kubernetes adds the cleanup controller). It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Defaults to infinite.

pytorchReplicaSpecs
map[github.com/kubeflow/pytorch-operator/pkg/apis/pytorch/v1.PyTorchReplicaType]*github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec

A map of PyTorchReplicaType (type) to ReplicaSpec (value). Specifies the PyTorch cluster configuration. For example, { “Master”: PyTorchReplicaSpec, “Worker”: PyTorchReplicaSpec, }

PyTorchReplicaType (string alias)

PyTorchReplicaType is the type for PyTorchReplica. Can be one of “Master” or “Worker”.


Generated with gen-crd-api-reference-docs on git commit e775742.


Last modified 30.04.2021: fix broken link (#2672) (c1aba76b)