Workflow configuration

This workflow allows different customization to be able to handle different types of input data. On this page we explain the different options that can be set to easily customize the workflow.

Default workflow

In order to explain what customizations are possible, we will first have a look at the default workflow.

Default:

  • Single-end fastq files

  • Differential expression analysis: on

  • DeepRibo predictions: off

For the default workflow, we expect the .fastq files to be in single-end format. Additionally, we activated differential expression by default. Differential expression requires multiple conditions and RIBO and RNA samples. A possible sample.tsv would look as follows:

method

condition

replicate

fastqFile

RIBO

A

1

fastq/RIBO-A-1.fastq.gz

RIBO

A

2

fastq/RIBO-A-2.fastq.gz

RIBO

B

1

fastq/RIBO-B-1.fastq.gz

RIBO

B

2

fastq/RIBO-B-2.fastq.gz

RNA

A

1

fastq/RNA-A-1.fastq.gz

RNA

A

2

fastq/RNA-A-2.fastq.gz

RNA

B

1

fastq/RNA-B-1.fastq.gz

RNA

B

2

fastq/RNA-B-2.fastq.gz

Note

By default only reparation predictions are used. The reason for this is that DeepRibo is not available on conda as of now and therfore requires additional tweaks to run it. The process is explained below.

No differential expression

If you do not have multiple conditions and differential expression is activated, you will receive an error message. To deactivate differential expression, you have to edit the config.yaml file.

adapter: ""
samples: "HRIBO/samples.tsv"
alternativestartcodons: "GTG,TTG"
# Differential expression: on / off
differentialexpression: "off"
# Deepribo predictions: on / off
deepribo: "off"

This will allow you the use of a sample.tsv like:

method

condition

replicate

fastqFile

RIBO

A

1

fastq/RIBO-A-1.fastq.gz

RIBO

A

2

fastq/RIBO-A-2.fastq.gz

RNA

A

1

fastq/RNA-A-1.fastq.gz

RNA

A

2

fastq/RNA-A-2.fastq.gz

Activating DeepRibo

Activating DeepRibo predictions will give you a different file with ORF predictions. By experience, the top DeepRibo results tend to be better than those of reparation. For archea, where reparation performs very poorly, DeepRibo is the preferred option.

Note

In order to use DeepRibo, the tool singularity is required. Please refer to the overview for details on the installation.

Once you have installed singularity turn on DeepRibo in the config.yaml:

adapter: ""
samples: "HRIBO/samples.tsv"
alternativestartcodons: "GTG,TTG"
# Differential expression: on / off
differentialexpression: "on"
# Deepribo predictions: on / off
deepribo: "on"

When calling snakemake, you will now require additional commandline arguments:

  • –use-singularity: specify that snakemake can now download and use docker container via singularity.

  • –singularity-args ” -c “: specify the --contain option to ensure that only the docker containers file system will be used.

Warning

DeepRibo cannot cope with genomes containing special IUPAC symbols, ensure that your genome file contains only A, G, C, T, N symbols.

If you run deepribo locally

When running the workflow with DeepRibo locally it might be advised to additionally use the --greediness 0 option, if you do not have a lot of cores available locally. This will cause the workflow to submit fewer jobs at the same time. This especially important for DeepRibo as we observed that a single DeepRibo job can finish in less than an hour if it does not have to fight for cores with another DeepRibo job. Otherwise, it can run for several hours at a time.

snakemake --use-conda --use-singularity --singularity-args " -c " -s HRIBO/Snakefile --configfile HRIBO/config.yaml --directory ${PWD} -j 10 --latency-wait 60

If you run deepribo on a cluster system

When running the workflow with DeepRibo on a cluster system. You have to add the above commandline arguments to your submission script.

#!/bin/bash
#PBS -N <ProjectName>
#PBS -S /bin/bash
#PBS -q "long"
#PBS -d <PATH/ProjectFolder>
#PBS -l nodes=1:ppn=1
#PBS -o <PATH/ProjectFolder>
#PBS -j oe
cd <PATH/ProjectFolder>
source activate HRIBO
snakemake --latency-wait 600 --use-conda --use-singularity --singularity-args " -c " -s HRIBO/Snakefile --configfile HRIBO/config.yaml --directory ${PWD} -j 20 --cluster-config HRIBO/templates/torque-cluster.yaml --cluster "qsub -N {cluster.jobname} -S /bin/bash -q {cluster.qname} -d <PATH/ProjectFolder> -l {cluster.resources} -o {cluster.logoutputdir} -j oe"

Note

If you cannot install singularity on your cluster, check whether there are modules available for you cluster system.

You can then create an additional submission script that will tell snakemake to activate the module before running jobs. An example of this would look as follows:

jobscript.sh

#!/bin/bash
module load devel/singularity/3.4.2
# properties = {properties}
{exec_job}

Then add the jobscript to the snakemake call:

#!/bin/bash
#PBS -N <ProjectName>
#PBS -S /bin/bash
#PBS -q "long"
#PBS -d <PATH/ProjectFolder>
#PBS -l nodes=1:ppn=1
#PBS -o <PATH/ProjectFolder>
#PBS -j oe
cd <PATH/ProjectFolder>
source activate HRIBO
snakemake --latency-wait 600 --use-conda --use-singularity --singularity-args " -c " --jobscript jobscript.sh -s HRIBO/Snakefile --configfile HRIBO/config.yaml --directory ${PWD} -j 20 --cluster-config HRIBO/templates/torque-cluster.yaml --cluster "qsub -N {cluster.jobname} -S /bin/bash -q {cluster.qname} -d <PATH/ProjectFolder> -l {cluster.resources} -o {cluster.logoutputdir} -j oe"

This will specify to snakemake that it will execute module load devel/singularity/3.4.2 when submitting each job.

Warning

This is a specific example for our TORQUE cluster system. The specific way of loading modules, as well as the available modules, can differ on each system.

Paired-end support

We allow paired-end data in our workflow. Unfortunately, many of the downstream tools, like the prediction tools, cannot use paired-end data. Therefore, we use the tool flash2 to convert paired-end data to single-end data.

In order to use paired-end data, simply replace the Snakefile with the Snakefile_pairedend. This will now require a special samples_pairedend.tsv, which is also available in the HRIBO templates folder.

method

condition

replicate

fastqFile1

fastqFile2

RIBO

A

1

fastq/RIBO-A-1_R1.fastq.gz

fastq/RIBO-A-1_R2.fastq.gz

RIBO

A

2

fastq/RIBO-A-2_R1.fastq.gz

fastq/RIBO-A-2_R2.fastq.gz

RIBO

B

1

fastq/RIBO-B-1_R1.fastq.gz

fastq/RIBO-B-1_R2.fastq.gz

RIBO

B

2

fastq/RIBO-B-2_R1.fastq.gz

fastq/RIBO-B-2_R2.fastq.gz

RNA

A

1

fastq/RNA-A-1_R1.fastq.gz

fastq/RNA-A-1_R2.fastq.gz

RNA

A

2

fastq/RNA-A-2_R1.fastq.gz

fastq/RNA-A-2_R2.fastq.gz

RNA

B

1

fastq/RNA-B-1_R1.fastq.gz

fastq/RNA-A-1_R2.fastq.gz

RNA

B

2

fastq/RNA-B-2_R1.fastq.gz

fastq/RNA-A-1_R2.fastq.gz