Snakemake can submit each of your jobs to the slurm scheduler for you! To enable this, you need to provide the --cluster
option to snakemake on the command line, and include all of the sbatch
information you normally put at the top of your submission files.
snakemake --cluster "sbatch -A CLUSTER_ACCOUNT -t CLUSTER_TIME -p CLUSTER_PARTITION -N CLUSTER_NODES -J JOBNAME" --jobs NUM_JOBS_TO_SUBMIT
Notes:
- Most clusters would prefer that you use an interactive session (or sbatch) to run this, so that you’re not running anything on the login nodes. Since this process is only submitting jobs, you can run this command on tmux/screen on a login node, but only do it for a small number of jobs or you’ll slow everyone up and your job will probably be killed by admin.
- the
--jobs
parameter allows snakemake to submit up to NUM_JOBS_TO_SUBMIT number of jobs, but please be aware of submission limits on your cluster. By default, snakemake will only submit jobs that can be run (input files already exist). There is a parameter called--immediate-submit
that will submit all jobs at once, but this may be an issue if the input files for those jobs are not available when those jobs make it through the scheduling queue.
Using a cluster configuration file
To save time you can also make a yml file containing your sbatch information, and tell snakemake where to find it.
Here’s an example cluster configuration file:
# cluster_config.yml - cluster configuration
__default__:
account: ACCOUNT
partition: PARTITION
time: 01:00:00 # time limit for each job
nodes: 1
ntasks-per-node: 14 #Request n cores be allocated per node.
chdir: /directory/to/change/to
output: a_name_for_my_job-%j.out
error: a_name_for_my_job-%j.err
Even if you tell snakemake where to find this file, it’s not going to use all of these parameters to submit each job - it will only use the onse you specify in the sbatch
portion of your --cluster
statement.
ACCOUNT
should correspond to the buyin user; for DIB members, this will be ctbrowngrp
. The partitions are described here.
To submit with identical parameters as we used above, run snakemake like so:
snakemake --cluster "sbatch -A {cluster.account} -t {cluster.time} -p {cluster.partition} -N {cluster.nodes}" --cluster-config cluster_config.yml --jobs NUM_JOBS_TO_SUBMIT
The information within the {}
are the parameters that snakemake will read from the cluster_config.yml
file.
In this configuration file above, I only have information for a __default__
, which will be used as the default for each rule. If you want to set specific time limits for each rule (or some rules), you can add that info to the file.
For example, if I have a rule called trimmomatic_raw
, I could add the following to my cluster_config.yml
file to specify some different cluster parameters for that rule.
trimmomatic_raw:
time: 00:45:00 # time limit for this rule only
For non-slurm clusters, you can change the cluster
command to reflect the scheduling service your cluster uses. See snakemake’s documentation for examples.
Examples
Example to run within tmux
source ~/.bashrc
conda activate snakemake
cd /home/ntpierce/2019-burgers-shrooms/orthofinder_work
snakemake -s diamond_blast.snakefile --use-conda --cluster "sbatch -t 0:30:00 -N 1 -c 14 -J dmnd --mem=30gb " --jobs 5
Example to submit as a job
#!/bin/bash -login
#SBATCH -D /home/ntpierce/2019-burgers-shrooms/orthofinder_work
#SBATCH -J dmnd_snake
#SBATCH -t 3-0:00:00
#SBATCH -N 1
#SBATCH --output /home/ntpierce/2019-burgers-shrooms/orthofinder_work/dmnd_snake-%j.out
#SBATCH --error /home/ntpierce/2019-burgers-shrooms/orthofinder_work/dmnd-snake-%j.err
# activate conda in general
source /home/ntpierce/.bashrc # if you have the conda init setting
# activate a specific conda environment, if you so choose
conda activate snakemake
# go to a particular directory
cd /home/ntpierce/2019-burgers-shrooms/orthofinder_work
# make things fail on errors
set -o nounset
set -o errexit
set -x
### run your commands here!
snakemake -s diamond_blast.snakefile --use-conda --cluster "sbatch -t 0:30:00 -N 1 -c 14 -J dmnd --mem=30gb " --jobs 5
Additional Resources
A simple fully functioning example for the farm cluster is here.
Here’s a carpentries tutorial you might find helpful. Note that this tutorial has a json
-formatted cluster configuration file. json
and yaml
files are read identically by snakemake, but I find yaml
to be more human-friendly! You can use either.
Take a look at the snakemake documention for cluster execution here: https://snakemake.readthedocs.io/en/stable/executable.html#cluster-execution