A first batch job
With the knowledge on how to submit an interactive job, submitting a batch job is straightforward.
The main difference is that you need to specify an Executable
to run, and should specify where to store output and logs.
In our first simple example, we will go with a batch script which shows some information about the environment inside the job.
We need two files:
Save the following into a file of your choosing or use the file
CentOS7_simple.jdl
from the repository.
Save the following into a file of your choosing or use the file
environment-info.sh
from the repository.
Please check that the shell script is executable - if not, run
chmod +x environment-info.sh
.
Usually, you should test your code before. If a special environment is needed, you can do that in an interactive job, before firing off many jobs. In our case, just test the script on the submit node by running
./environment-info.sh
and check what happens.
Now, you can finally submit the job:
Please note that this fails!
HTCondor usually performs a check whether the log files and other output files can be written before
submitting the job. You can turn this off by adding -disable
to the call to condor_submit
, which speeds up submission - but then
the jobs will go into HOLD
state in case the files can not be written on the submit node.
So you will want to fix the problem:
Now, please try again:
Now, you can investigate the job a bit. Some examples follow.
- Use
condor_q
. - Check out some more details with
condor_q -long clusterid.process
. - Check out the files inside the
logs
directory. - Try to follow along the job output using
condor_tail -f clusterid.process
.
Especially at this point, you are invited to ask questions about what you find!
Removing jobs
Take note of the possibility to remove jobs, for example when you are finished with your investigations or have found a bug and want to re-submit!
To remove a full cluster of jobs:
To remove a single job:
To remove all your jobs:
Submit another one of your test jobs and remove it. How long does it take? Check the status with
condor_q
during the process.