Analysis result files

The important files in this workflow are listed and explained below.

ORF Predictions

The output files containing information about predicted Open Reading Frames, these also contain novel predictions.

predictions_reparation.xlsx

This file contains all reparation ORF predictions.

Column name

Description

Identifier

Unique identifier describing the entry.

Genome

The genome accession identifier.

Source

The source of the ORF. (here reparation)

Feature

The feature of the ORF (here CDS)

Start

The start position of the ORF.

Stop

The stop position of the ORF.

Strand

The strand of the ORF. (+/-)

Pred_probability

The probability of this ORF (0.5-1, 1 being the best value)

Locus_tag

If the detected ORF is already in the annotation, this gives its locus tag.

Old_locus_tag

The old locus tag of a gene (if available in the annotation)

Name

If the detected ORF is already in the annotation, this gives its name.

Length

The length of the ORF.

Codon_count

The number of codons in the ORF. (length/3)

<method>-<condition>-<replicate>_TE

The translational efficiency for the given sample.

<method>-<condition>-<replicate>_rpkm

The RPKM for the given sample.

Evidence

The <condition>-<replicate> sample in which this ORF was predicted.

Start_codon

The start codon of the ORF.

Stop_codon

The stop codon of the ORF.

15nt_upstream

The 15nt upstream of the start codon

Nucleotide_seq

The nucleotide sequence of the ORF.

Aminoacid_seq

The amino acid sequence of the ORF.

predictions_reparation.gff

An annotation file in .gff3 format containing all predictions of reparation for visualization in a genome browser.

predictions_deepribo.xlsx

Note

These files are only available when activating DeepRibo predictions in the config.yaml. (see workflow-configuration <workflow-configuration:workflow-configuration>)

This file contains all DeepRibo ORF predictions.

Column name

Description

Identifier

Unique identifier describing the entry.

Genome

The genome accession identifier.

Source

The source of the ORF. (here reparation)

Feature

The feature of the ORF (here CDS)

Start

The start position of the ORF.

Stop

The stop position of the ORF.

Strand

The strand of the ORF. (+/-)

Pred_value

The value DeepRibo attributes the given prediction.

Pred_rank

The rank calculated from the prediction value. (the best prediction has rank 1)

Novel_rank

A special ranking involving only novel ORFs that are not in the annotation.

Locus_tag

If the detected ORF is already in the annotation, this gives its locus tag.

Old_locus_tag

The old locus tag of a gene (if available in the annotation)

Name

If the detected ORF is already in the annotation, this gives its name.

Length

The length of the ORF.

Codon_count

The number of codons in the ORF. (length/3)

<method>-<condition>-<replicate>_TE

The translational efficiency for the given sample.

<method>-<condition>-<replicate>_rpkm

The RPKM for the given sample.

Evidence

The <condition>-<replicate> sample in which this ORF was predicted.

Start_codon

The start codon of the ORF.

Stop_codon

The stop codon of the ORF.

15nt_upstream

The 15nt upstream of the start codon

Nucleotide_seq

The nucleotide sequence of the ORF.

Aminoacid_seq

The amino acid sequence of the ORF.

predictions_deepribo.gff

Note

These files are only available when activating DeepRibo predictions in the config.yaml. (see workflow-configuration <workflow-configuration:workflow-configuration>)

An annotation file in .gff3 format containing all predictions of DeepRibo for visualization in a genome browser.

Quality control

This comprises all files that can help to perform quality control on all input samples.

multiqc_report.html

The multiQC report collects information from different tools, including fastQC and subread featurecounts. The general statistics give an overview over:

  • the number of duplicates

  • the GC content

  • the average read lengths

  • the number of reads (in millions)

These statistics are collected after each processing step of our pipeline.

  • raw: the unprocessed data

  • trimmed: the data after trimming the adapter sequences

  • mapped: the data after mapping with Segemehl

  • unique: the data after removing multi-mapping reads

  • norRNA: the data after filtering out the rRNA

Further, feature counts are provided for different features from the annotation file. (i.e. how many reads map to each feature) This includes, all(featurecount), rRNA, norRNA(after filtering), tRNA and ncRNA. Following is a fastQC report including sequence counts, sequence quality histograms, per sequence quality scores, per base sequence content, per sequence GC content, per base N content, sequence length distribution, sequence duplication levels, overrepresented features, adapter content and a status overview.

heatmap_SpearmanCorr_readCounts.pdf

Spearman correlation coefficients of read counts. The dendrogram indicates which samples read counts are most similar to each other. Since there should be always a higher correlation between experiments with the same condition and experiment type (e.g. replicates) and not others, this is a rapid way to quality-control the labeling/consistency of input data.

annotation_total.xlsx

This file contains detailed measures for every feature in the input annotation using read counts including multi-mapping reads.

Column name

Description

Identifier

Unique identifier describing the entry.

Genome

The genome accession identifier.

Source

The source of the annotated feature.

Feature

The feature of the annotated feature.

Start

The start position of the annotated feature.

Stop

The stop position of the annotated feature.

Strand

The strand of the annotated feature. (+/-)

Locus_tag

The locus tag of the annotated feature. (if available)

Old_locus_tag

The old locus tag of a gene (if available in the annotation)

Name

The name of the annotated feature. (if available)

Length

The length of the annotated feature.

Codon_count

The number of codons in the annotated feature. (length / 3)

<method>-<condition>-<replicate>_TE

The translational efficiency for the given sample.

<method>-<condition>-<replicate>_rpkm

The RPKM for the given sample. (ReadsPerKilobaseMillion)

Start_codon

The start codon of the annotated feature.

Stop_codon

The stop codon of the annotated feature.

15nt_upstream

The 15nt upstream of the start codon

Nucleotide_seq

The nucleotide sequence of the annotated feature.

Aminoacid_seq

The amino acid sequence of the annotated feature.

Product

The product of the annotated feature. (if available)

Note

The note of the annotated feature. (if available)

total_read_counts.xlsx

This file shows the overall read-counts for each feature annotated in the user-provided annotation, after mapping and before removal of multi-mapping reads.

annotation_unique.xlsx

This file contains detailed measures for every feature in the input annotation using read counts after removal of multi-mapping reads.

Column name

Description

Identifier

Unique identifier describing the entry.

Genome

The genome accession identifier.

Source

The source of the annotated feature.

Feature

The feature of the annotated feature.

Start

The start position of the annotated feature.

Stop

The stop position of the annotated feature.

Strand

The strand of the annotated feature. (+/-)

Locus_tag

The locus tag of the annotated feature. (if available)

Old_locus_tag

The old locus tag of a gene (if available in the annotation)

Name

The name of the annotated feature. (if available)

Length

The length of the annotated feature.

Codon_count

The number of codons in the annotated feature. (length / 3)

<method>-<condition>-<replicate>_TE

The translational efficiency for the given sample.

<method>-<condition>-<replicate>_rpkm

The RPKM for the given sample. (ReadsPerKilobaseMillion)

Start_codon

The start codon of the annotated feature.

Stop_codon

The stop codon of the annotated feature.

15nt_upstream

The 15nt upstream of the start codon

Nucleotide_seq

The nucleotide sequence of the annotated feature.

Aminoacid_seq

The amino acid sequence of the annotated feature.

Product

The product of the annotated feature. (if available)

Note

The note of the annotated feature. (if available)

unique_read_counts.xlsx

This file shows the overall read-counts for each feature annotated in the user-provided annotation, after mapping and after removal of multi-mapping reads.

genome-browser

The files that can be used for visualization in a genome browser.

updated_annotation.gff

A gff track containing both the original annotation together with the new predictions by reparation.

potentialStartCodons.gff

A genome browser track with all possible start codons.

potentialStopCodons.gff

A genome browser track with all possible stop codons.

potentialRibosomeBindingSite.gff

A genome browser track with possible ribosome binding sites.

potentialAlternativeStartCodons.gff

A genome browser track with alternative start codons.

BigWig coverage files

We offer many different single nucleotide mapping bigwig files for genome browser visualization. These files are available for different regions and performed with different methods.

  • global: full read is mapped

  • centered: region around the center.

  • threeprime: region around the three prime end.

  • fiveprime: region around the five prime end.

These are all available with the following normalization methods:

  • raw: raw, unprocessed files. This should only be used to check the coverage of a single file. It should not be used to compare to other files.

  • min: normalized by number of minimal total reads per sample (factor = min. number of reads / number of reads). This is the recommended normalization when comparing different samples from the same experiment.

  • mil: normalized by 1000000 (factor = 1000000 / number of reads). This is the recommended normalization when comparing different samples from the different experiments.

Differential Expression

Files related to the differential expression analysis.

riborex/<contrast>_sorted.xlsx

Table containing all differential expression results from riborex.

riborex/<contrast>_significant.xlsx

Table containing significant differential expression results from riborex (pvalue < 0.05).

xtail/<contrast>_sorted.xlsx

Table containing all differential expression results from xtail.

xtail/<contrast>_significant.xlsx

Table containing significant differential expression results from xtail (pvalue < 0.05).

xtail/r_<contrast>.pdf

This figure shows the RPF-to-mRNA ratios in two conditions, where the position of each gene is determined by its RPF-to-mRNA ratio (log2R) in two conditions, represented on the x-axis and y-axis respectively. The points will be color-coded with the pvalue final obtained with xtail (more significant p values having darker color)

  • blue: for genes with log2R larger in first condition than second condition.

  • red: for genes with log2R larger in second condition than the first condition.

  • green: for genes with log2R changing homodirectionally in two condition.

  • yellow: for genes with log2R changing antidirectionally in two condition.

xtail/fc_<contrast>.pdf

This figure shows the result of the differential expression at the two expression levels, where each gene is a dot whose position is determined by its log2 fold change (log2FC) of transcriptional level (mRNA log2FC), represented on the x-axis, and the log2FC of translational level (RPF log2FC), represented on the y-axis. The points will be color-coded with the pvalue final obtained with xtail (more significant p values having darker color)

  • blue: for genes whos mRNA log2FC larger than 1 (transcriptional level).

  • red: for genes whos RPF log2FC larger than 1 (translational level).

  • green: for genes changing homodirectionally at both level.

  • yellow: for genes changing antidirectionally at two levels.

Metagene Analysis

Meta gene profiling analyses the distribution of mapped reads around the start codon. Moreover for Ribo-seq it is expected that the ribosome protects a specific range of read lengths, often typical for the investigated group of organisms, from digestion by nuclease. These reads should show a typical peak around the start codon which corresponds to the high frequency that ribosomes are bound there. We output and plot the meta gene profiling for each individual fragment length as a quality control for the Ribo-seq protocol. If the distribution for all read lengths is untypical, arresting the ribosomes failed.

<accession>_Z.Y_profiling.xlsx/tsv

The table shows for a range of specific read lengths, how many reads on average over all start codons in the genome have been mapped per nucleotide. The nucleotides range from 100 nucleotides upstream of the start codon to 399 nucleotides downstream. The read counts are either raw or normalized by average read count per nucleotide, for the range around the start codon. Moreover different single nucleotide mapping variants are considered, where only the 5’, 3’ or centered region of the read is counted.

<accession>_Z.Y_profiling.pdf

Additional output

samples.xlsx

An excel representation of the input sample file.

manual.pdf

A PDF format file giving some explanations about the output files, contained in the final result report.

overview.xlsx

An overview table containing all information gathered from the prediction tools and differential expression analysis. The contents of this table change depending on which options are set. The overview table for the default workflow will contain annotation. reparation, deepribo and differential expression output.

Column name

Description

Identifier

Unique identifier describing the entry.

Genome

The genome accession identifier.

Start

The start position of the ORF.

Stop

The stop position of the ORF.

Strand

The strand of the ORF. (+/-)

Locus_tag

The locus tag of ORF. (if not novel)

Overlapping_genes

Genes that overlap with the predicted ORF

Old_locus_tag

The old locus tag of a gene (if available in the annotation)

Name

The name of the ORF. (if not novel)

Gene_name

The name of the ORFs associated gene feature. (if not novel)

Length

The length of the ORF.

Codon_count

The number of codons in the ORF. (length / 3)

Start_codon

The start codon of the annotated feature.

Stop_codon

The stop codon of the annotated feature.

15nt_upstream

The 15nt upstream of the start codon

Nucleotide_seq

The nucleotide sequence of the annotated feature.

Aminoacid_seq

The amino acid sequence of the annotated feature.

<method>-<condition>-<replicate>_TE

The translational efficiency for the given sample.

<method>-<condition>-<replicate>_rpkm

The RPKM for the given sample. (ReadsPerKilobaseMillion)

Evidence_reparation

The sample this ORF was predicted in (for reparation)

Reparation_probability

The probability of this ORF (0.5-1, 1 being the best value)

Evidence_deepribo

The sample this ORF was predicted in (for deepribo)

Deepribo_rank

The deepribo rank for this ORF. (1 being the best value, 999999 undefined)

Deepribo_score

The score the deepribo rank is based on.

riborex_pvalue

The pvalue (determined by riborex)

riborex_pvalue_adjusted

The adjusted pvalue (determined by riborex)

riborex_log2FC

The log2FC (determined by riborex)

xtail_pvalue

The pvalue (determined by xtail)

xtail_pvalue_adjusted

The adjusted pvalue (determined by xtail)

xtail_log2FC

The log2FC (determined by xtail)