Kubernetes – Job

A Kubernetes Job can run multiple instances of a Pod to complete a set of tasks. It can run them in sequence or in parallel or in a mixed mode. If all the instances complete successfully, the Job controller marks the Job as complete.

How to plan your execution ?

The Job includes one Pod template. It only creates multiple instances of the Pod and allows you to run them in sequence, parallel or in a mix mode. Each instance acts as worker node, as shown, to execute a part of the task. As regards the tasks, a file , a database or a queue might be holding these tasks. Here are the possible approaches that we can take:

  • Single Pod instance per Job – A single Pod can fetch and process all task in a queue in a loop.
  • Parallel Pod instances per Job – Multiple instances can be fetch and share the tasks from a given queue, to process them in parallel. Each instance gets over as it finds no more task.
  • Sequential Pod instances per Job – This can be a case of a known set of dependent steps that needs to be executed one after the other.
    • ‘Process Applications’, ‘Archive Rejected’ and ‘Daily Report’ can be 3 steps in a Job for which we can use 3 Pods in a sequence.

Kubernetes Jobs supports the parallel and the Sequential execution of pods using two simple configurations, as specified below:

spec:
  parallelism: 3
spec:
  completions: 3    

Let us start with creating a Job with a single Pod.

Create a Job with a Single Pod

demo-job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  template:
    metadata:
      name: worker-pod
    spec:
      containers:
          - name: worker-pod
            image: busybox
            args:
            - /bin/sh
            - -c
            - "for task in 1 2 3 4 5; do echo Task -${task} completed!; done;echo 'Done !'"
      restartPolicy: OnFailure

demo-job.yaml is a simple Job that simply mocks few tasks and prints the completion statements. Lets deploy the Job using the below command and explore the output :

kubectl apply -f demo-job.yaml

$ kubectl get job,pod
NAME                 COMPLETIONS   DURATION   AGE
job.batch/demo-job   1/1           3s         37s

NAME                 READY   STATUS      RESTARTS   AGE
pod/demo-job-5v8mx   0/1     Completed   0          37s

$ kubectl logs demo-job-5v8mx
Task -1 completed!
Task -2 completed!
Task -3 completed!
Task -4 completed!
Task -5 completed!
Done !

As we notice, the Job has created a Pod based on the template. The Pod ideally implements the logic to fetch and execute a set of tasks as mentioned above. Here, we have simply printed some statements to mock those tasks. As the worker Pod completes, the Job finishes successfully showing 1\1 worker Pod has completed.

How to automate the cleanup of the completed Jobs ?

The Job controller retains the Job and its Pods to preserve the logs for verification and error analysis, even after its completion. You may delete them separately or delete the Job to delete them all.

To automate the cleanup, we can make use of the TTL mechanism as shown below which will do the cleanup 600 seconds after the completion of the Job. We need to enable the TTL mechanism for this.

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  ttlSecondsAfterFinished: 600
  template:
  . . .
  . . .
How to stop a Job in progress ?

A Job does not provide any option to stop or pause it in between. You can only delete the Job, which will delete its Pods too. If you are not storing the logs in an external storage, you will loose your logs.

To handle failure conditions, it provides with two parameters – backoffLimit which set max retries & activeDeadlineSeconds which sets a deadline to complete.

The below Job tells –

  • Stop the Job if the container or the Pod is restarted for 3 times.
    • The default value of backoffLimit is 6
  • Stop the Job if it exceeds 300 seconds.
    • The Job controller will terminate all the active Pods and mark the Job as failed.
apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  completions: 2
  backoffLimit: 3
  activeDeadlineSeconds: 300
  template:
    metadata:
      name: worker-pod
    spec:
      containers:
          - name: worker-pod
            image: busybox
            args:
            - /bin/sh
            - -c
            - echo Started processing;'sleep 10;echo Task completed;
      restartPolicy: OnFailure
Execution Patterns – Sequential, Parallel & Mixed

A Kubernetes Job allows you to run a set of Pods in parallel or run them in a sequence or run in a mixed mode. All these patterns would be useful to solve different use cases as mentioned above.

Run Workers in Parallel

To run workers in parallel, use :

spec:
  parallelism: 3

If we set this value to our demo-job, we can see 3 Pods running in parallel, as shown:

$ kubectl get job,pod
NAME                          COMPLETIONS   DURATION   AGE
job.batch/demo-job   0/1 of 3      11s        11s

NAME                          READY   STATUS    RESTARTS   AGE
pod/demo-job-5ddwc   1/1     Running   0          11s
pod/demo-job-7pd5g   1/1     Running   0          11s
pod/demo-job-vq2dk   1/1     Running   0          11s

parallelism of N, starts N Pods in parallel. This internally sets the ‘completions’ to N.

Run Workers in Sequence

To run worker Pods in sequence, use :

spec:
  completions: 5

The ‘completions’ value fixes the max number of Pods in a Job. If we do not specify any value for ‘parallelism’, it retains the default value of 1, and all the Pods run in a sequence. As we can see below, the Job is creating the 3rd Pod after the 2nd one is complete.

$ kubectl get job,pod
NAME                 COMPLETIONS   DURATION   AGE
job.batch/demo-job   2/5           29s        29s

NAME                 READY   STATUS              RESTARTS   AGE
pod/demo-job-46d57   0/1     Completed           0          15s
pod/demo-job-b9h96   0/1     ContainerCreating   0          2s
pod/demo-job-wz52v   0/1     Completed           0          29s
Mix Parallel & Sequential

K8s allows us to mix the parallel and sequential execution together. In such a case, the ‘completions’ provides max number of Pods to run where as the ‘parallelism’ says how many to run in parallel. A sample Job definition is given below.

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  completions: 10
  parallelism: 3
  template:
    metadata:
      name: worker-pod
    spec:
      containers:
          - name: worker-pod
            image: busybox
            args:
            - /bin/sh
            - -c
            - echo Started processing;sleep 10;echo Task completed;
      restartPolicy: OnFailure
Conclusion

As we have seen, the Jobs supports the core features to deploy task specific applications that completes after the task is over.

The summary of the key points about a Job are:

  • A Job can run its Pod instances in sequence or in parallel or in a mixed.
  • The Pods has to stop successfully for a Job to succeed.
    • As a result, it is not allowed to have a restartPolicy: Always.
    • It can only use restartPolicy values – OnFailure or Never
  • backoffLimit & activeDeadlineSeconds – key attribute to stop a failed Job.
  • Jobs are not deleted by default to preserve the logs, we need to take measures to clean up.
  • As a closely related module, K8s provides CronJob that comes with a scheduler.
    • It can be used for automating the Jobs at desired intervals.