Quick Intro to Jobs and Cronjobs and What to Do if They Fail

One of the finer controls in Kubernetes is over container lifecycle; consider
that if a Pod is your basic deployment unit, and replication controllers like
Deployments and DaemonSets are just reconciliation loops that track the
desired number of replicas against the running number of replicas, why shouldn’t
you be able to schedule a single-use, or recurring one-off executions of a
script in a Pod as well? Well, that’s exactly what the Job resource is for in
Kubernetes!

The primary difference between Job and CronJob workloads is exactly what it
sounds like; a CronJob will run on a schedule, rather than as a one-off task,
and you can mix these workloads, as you’ll see later on. I’ll focus mostly on
CronJobs in this piece, but will provide some examples for Job workloads as
well.

It doesn’t make sense to run this as a persistent service; I can’t use restart
policies, for example, to say I’d like this to happen every 6 hours, and I don’t
want to keep the resources required for this locked up when it’s not in-use, so
I wrote a CronJob:

So, you’ll see in the above, the definition looks a lot like a normal Pod
definition, and that’s because it is; you’re defining things like volumeMounts
for storage, env for your variables, and even nodeSelector to tell the API
where to schedule these containers (in my case, biggie-storage nodes have
appropriate space and performance constraints for this job!). The schedule key
takes your normal cron syntax [crontab.guru] as a value, and runs the job on
that schedule.

When you run kubectl get pods you’ll see the pods as you would for normal
workloads, and you’ll also see that the API is cleaning up after these jobs once
they exit:

and you’ll see statuses like Completed if the job ran and exited 0, Running
for jobs in progress, and Error or whatever failed state for jobs that did not
run successfully, and troubleshooting them is similar to any other workload:

kubectl describe pod $POD_NAME

to check the events for that Pod or checking the container workload itself:

kubectl logs $POD_NAME [$CONTAINER if required]

If you’d like to run the job manually, as a one-off task before the next cron
run (if the restart policy does not catch the failure, or you’d like to make a
change to the job on the spec-level), there is also a mechanism in kubectl to
provide for this:

If you have a scheduled CronJob in Kubernetes that has failed, or has no Pods
available to complete the task, you can run the task as a Job instead to
complete the task.

To create it from the manifest in Kubernetes for the existing CronJob, you can
use the from argument with kubectl to define the source job to be templated for
the new Job:

kubectl create job my-job --from=cronjob/my-cron-job

which will proceed to schedule the pods for this task. You can modify this Job,
after mounting the spec from the failed CronJob, and run this task out of scope
for the recurring task, for example, with the edit argument, and then treat
this as an at-will workload.