Analysis result files¶

The important files in this workflow are listed and explained below.

ORF Predictions¶

The output files containing information about predicted Open Reading Frames, these also contain novel predictions.

predictions_reparation.xlsx¶

This file contains all reparation ORF predictions.

Column name	Description
Identifier	Unique identifier describing the entry.
Genome	The genome accession identifier.
Source	The source of the ORF. (here reparation)
Feature	The feature of the ORF (here CDS)
Start	The start position of the ORF.
Stop	The stop position of the ORF.
Strand	The strand of the ORF. (+/-)
Pred_probability	The probability of this ORF (0.5-1, 1 being the best value)
Locus_tag	If the detected ORF is already in the annotation, this gives its locus tag.
Old_locus_tag	The old locus tag of a gene (if available in the annotation)
Name	If the detected ORF is already in the annotation, this gives its name.
Length	The length of the ORF.
Codon_count	The number of codons in the ORF. (length/3)
<method>-<condition>-<replicate>_TE	The translational efficiency for the given sample.
<method>-<condition>-<replicate>_rpkm	The RPKM for the given sample.
Evidence	The <condition>-<replicate> sample in which this ORF was predicted.
Start_codon	The start codon of the ORF.
Stop_codon	The stop codon of the ORF.
15nt_upstream	The 15nt upstream of the start codon
Nucleotide_seq	The nucleotide sequence of the ORF.
Aminoacid_seq	The amino acid sequence of the ORF.

predictions_reparation.gff¶

An annotation file in .gff3 format containing all predictions of reparation for visualization in a genome browser.

predictions_deepribo.xlsx¶

Note

These files are only available when activating DeepRibo predictions in the config.yaml. (see workflow-configuration <workflow-configuration:workflow-configuration>)

This file contains all DeepRibo ORF predictions.

Column name	Description
Identifier	Unique identifier describing the entry.
Genome	The genome accession identifier.
Source	The source of the ORF. (here reparation)
Feature	The feature of the ORF (here CDS)
Start	The start position of the ORF.
Stop	The stop position of the ORF.
Strand	The strand of the ORF. (+/-)
Pred_value	The value DeepRibo attributes the given prediction.
Pred_rank	The rank calculated from the prediction value. (the best prediction has rank 1)
Novel_rank	A special ranking involving only novel ORFs that are not in the annotation.
Locus_tag	If the detected ORF is already in the annotation, this gives its locus tag.
Old_locus_tag	The old locus tag of a gene (if available in the annotation)
Name	If the detected ORF is already in the annotation, this gives its name.
Length	The length of the ORF.
Codon_count	The number of codons in the ORF. (length/3)
<method>-<condition>-<replicate>_TE	The translational efficiency for the given sample.
<method>-<condition>-<replicate>_rpkm	The RPKM for the given sample.
Evidence	The <condition>-<replicate> sample in which this ORF was predicted.
Start_codon	The start codon of the ORF.
Stop_codon	The stop codon of the ORF.
15nt_upstream	The 15nt upstream of the start codon
Nucleotide_seq	The nucleotide sequence of the ORF.
Aminoacid_seq	The amino acid sequence of the ORF.

predictions_deepribo.gff¶

Note

These files are only available when activating DeepRibo predictions in the config.yaml. (see workflow-configuration <workflow-configuration:workflow-configuration>)

An annotation file in .gff3 format containing all predictions of DeepRibo for visualization in a genome browser.

Quality control¶

This comprises all files that can help to perform quality control on all input samples.

multiqc_report.html¶

The multiQC report collects information from different tools, including fastQC and subread featurecounts. The general statistics give an overview over:

the number of duplicates
the GC content
the average read lengths
the number of reads (in millions)

These statistics are collected after each processing step of our pipeline.

raw: the unprocessed data
trimmed: the data after trimming the adapter sequences
mapped: the data after mapping with Segemehl
unique: the data after removing multi-mapping reads
norRNA: the data after filtering out the rRNA

Further, feature counts are provided for different features from the annotation file. (i.e. how many reads map to each feature) This includes, all(featurecount), rRNA, norRNA(after filtering), tRNA and ncRNA. Following is a fastQC report including sequence counts, sequence quality histograms, per sequence quality scores, per base sequence content, per sequence GC content, per base N content, sequence length distribution, sequence duplication levels, overrepresented features, adapter content and a status overview.

heatmap_SpearmanCorr_readCounts.pdf¶

Spearman correlation coefficients of read counts. The dendrogram indicates which samples read counts are most similar to each other. Since there should be always a higher correlation between experiments with the same condition and experiment type (e.g. replicates) and not others, this is a rapid way to quality-control the labeling/consistency of input data.

annotation_total.xlsx¶

This file contains detailed measures for every feature in the input annotation using read counts including multi-mapping reads.

Column name	Description
Identifier	Unique identifier describing the entry.
Genome	The genome accession identifier.
Source	The source of the annotated feature.
Feature	The feature of the annotated feature.
Start	The start position of the annotated feature.
Stop	The stop position of the annotated feature.
Strand	The strand of the annotated feature. (+/-)
Locus_tag	The locus tag of the annotated feature. (if available)
Old_locus_tag	The old locus tag of a gene (if available in the annotation)
Name	The name of the annotated feature. (if available)
Length	The length of the annotated feature.
Codon_count	The number of codons in the annotated feature. (length / 3)
<method>-<condition>-<replicate>_TE	The translational efficiency for the given sample.
<method>-<condition>-<replicate>_rpkm	The RPKM for the given sample. (ReadsPerKilobaseMillion)
Start_codon	The start codon of the annotated feature.
Stop_codon	The stop codon of the annotated feature.
15nt_upstream	The 15nt upstream of the start codon
Nucleotide_seq	The nucleotide sequence of the annotated feature.
Aminoacid_seq	The amino acid sequence of the annotated feature.
Product	The product of the annotated feature. (if available)
Note	The note of the annotated feature. (if available)

total_read_counts.xlsx¶

This file shows the overall read-counts for each feature annotated in the user-provided annotation, after mapping and before removal of multi-mapping reads.

annotation_unique.xlsx¶

This file contains detailed measures for every feature in the input annotation using read counts after removal of multi-mapping reads.

Column name	Description
Identifier	Unique identifier describing the entry.
Genome	The genome accession identifier.
Source	The source of the annotated feature.
Feature	The feature of the annotated feature.
Start	The start position of the annotated feature.
Stop	The stop position of the annotated feature.
Strand	The strand of the annotated feature. (+/-)
Locus_tag	The locus tag of the annotated feature. (if available)
Old_locus_tag	The old locus tag of a gene (if available in the annotation)
Name	The name of the annotated feature. (if available)
Length	The length of the annotated feature.
Codon_count	The number of codons in the annotated feature. (length / 3)
<method>-<condition>-<replicate>_TE	The translational efficiency for the given sample.
<method>-<condition>-<replicate>_rpkm	The RPKM for the given sample. (ReadsPerKilobaseMillion)
Start_codon	The start codon of the annotated feature.
Stop_codon	The stop codon of the annotated feature.
15nt_upstream	The 15nt upstream of the start codon
Nucleotide_seq	The nucleotide sequence of the annotated feature.
Aminoacid_seq	The amino acid sequence of the annotated feature.
Product	The product of the annotated feature. (if available)
Note	The note of the annotated feature. (if available)

unique_read_counts.xlsx¶

This file shows the overall read-counts for each feature annotated in the user-provided annotation, after mapping and after removal of multi-mapping reads.

genome-browser¶

The files that can be used for visualization in a genome browser.

updated_annotation.gff¶

A gff track containing both the original annotation together with the new predictions by reparation.

potentialStartCodons.gff¶

A genome browser track with all possible start codons.

potentialStopCodons.gff¶

A genome browser track with all possible stop codons.

potentialRibosomeBindingSite.gff¶

A genome browser track with possible ribosome binding sites.

potentialAlternativeStartCodons.gff¶

A genome browser track with alternative start codons.

BigWig coverage files¶

We offer many different single nucleotide mapping bigwig files for genome browser visualization. These files are available for different regions and performed with different methods.

global: full read is mapped
centered: region around the center.
threeprime: region around the three prime end.
fiveprime: region around the five prime end.

These are all available with the following normalization methods:

raw: raw, unprocessed files. This should only be used to check the coverage of a single file. It should not be used to compare to other files.
min: normalized by number of minimal total reads per sample (factor = min. number of reads / number of reads). This is the recommended normalization when comparing different samples from the same experiment.
mil: normalized by 1000000 (factor = 1000000 / number of reads). This is the recommended normalization when comparing different samples from the different experiments.

Differential Expression¶

Files related to the differential expression analysis.

riborex/<contrast>_sorted.xlsx¶

Table containing all differential expression results from riborex.

riborex/<contrast>_significant.xlsx¶

Table containing significant differential expression results from riborex (pvalue < 0.05).

xtail/<contrast>_sorted.xlsx¶

Table containing all differential expression results from xtail.

xtail/<contrast>_significant.xlsx¶

Table containing significant differential expression results from xtail (pvalue < 0.05).

xtail/r_<contrast>.pdf¶

This figure shows the RPF-to-mRNA ratios in two conditions, where the position of each gene is determined by its RPF-to-mRNA ratio (log2R) in two conditions, represented on the x-axis and y-axis respectively. The points will be color-coded with the pvalue final obtained with xtail (more significant p values having darker color)

blue: for genes with log2R larger in first condition than second condition.
red: for genes with log2R larger in second condition than the first condition.
green: for genes with log2R changing homodirectionally in two condition.
yellow: for genes with log2R changing antidirectionally in two condition.

xtail/fc_<contrast>.pdf¶

This figure shows the result of the differential expression at the two expression levels, where each gene is a dot whose position is determined by its log2 fold change (log2FC) of transcriptional level (mRNA log2FC), represented on the x-axis, and the log2FC of translational level (RPF log2FC), represented on the y-axis. The points will be color-coded with the pvalue final obtained with xtail (more significant p values having darker color)

blue: for genes whos mRNA log2FC larger than 1 (transcriptional level).
red: for genes whos RPF log2FC larger than 1 (translational level).
green: for genes changing homodirectionally at both level.
yellow: for genes changing antidirectionally at two levels.

Metagene Analysis¶

Meta gene profiling analyses the distribution of mapped reads around the start codon. Moreover for Ribo-seq it is expected that the ribosome protects a specific range of read lengths, often typical for the investigated group of organisms, from digestion by nuclease. These reads should show a typical peak around the start codon which corresponds to the high frequency that ribosomes are bound there. We output and plot the meta gene profiling for each individual fragment length as a quality control for the Ribo-seq protocol. If the distribution for all read lengths is untypical, arresting the ribosomes failed.

<accession>_Z.Y_profiling.xlsx/tsv¶

The table shows for a range of specific read lengths, how many reads on average over all start codons in the genome have been mapped per nucleotide. The nucleotides range from 100 nucleotides upstream of the start codon to 399 nucleotides downstream. The read counts are either raw or normalized by average read count per nucleotide, for the range around the start codon. Moreover different single nucleotide mapping variants are considered, where only the 5’, 3’ or centered region of the read is counted.

<accession>_Z.Y_profiling.pdf¶

Additional output¶

samples.xlsx¶

An excel representation of the input sample file.

manual.pdf¶

A PDF format file giving some explanations about the output files, contained in the final result report.

overview.xlsx¶

An overview table containing all information gathered from the prediction tools and differential expression analysis. The contents of this table change depending on which options are set. The overview table for the default workflow will contain annotation. reparation, deepribo and differential expression output.

Column name	Description
Identifier	Unique identifier describing the entry.
Genome	The genome accession identifier.
Start	The start position of the ORF.
Stop	The stop position of the ORF.
Strand	The strand of the ORF. (+/-)
Locus_tag	The locus tag of ORF. (if not novel)
Overlapping_genes	Genes that overlap with the predicted ORF
Old_locus_tag	The old locus tag of a gene (if available in the annotation)
Name	The name of the ORF. (if not novel)
Gene_name	The name of the ORFs associated gene feature. (if not novel)
Length	The length of the ORF.
Codon_count	The number of codons in the ORF. (length / 3)
Start_codon	The start codon of the annotated feature.
Stop_codon	The stop codon of the annotated feature.
15nt_upstream	The 15nt upstream of the start codon
Nucleotide_seq	The nucleotide sequence of the annotated feature.
Aminoacid_seq	The amino acid sequence of the annotated feature.
<method>-<condition>-<replicate>_TE	The translational efficiency for the given sample.
<method>-<condition>-<replicate>_rpkm	The RPKM for the given sample. (ReadsPerKilobaseMillion)
Evidence_reparation	The sample this ORF was predicted in (for reparation)
Reparation_probability	The probability of this ORF (0.5-1, 1 being the best value)
Evidence_deepribo	The sample this ORF was predicted in (for deepribo)
Deepribo_rank	The deepribo rank for this ORF. (1 being the best value, 999999 undefined)
Deepribo_score	The score the deepribo rank is based on.
riborex_pvalue	The pvalue (determined by riborex)
riborex_pvalue_adjusted	The adjusted pvalue (determined by riborex)
riborex_log2FC	The log2FC (determined by riborex)
xtail_pvalue	The pvalue (determined by xtail)
xtail_pvalue_adjusted	The adjusted pvalue (determined by xtail)
xtail_log2FC	The log2FC (determined by xtail)