AtLASS

AtLASS (Attention-based bidirectional LSTM for Annotating Splice Sites) is fully automated learning and predicting tool for Splice Sites.
AtLASS requires Genome file in fasta format and RNA-seq files in fastq format or BAM file.

Hardware requirements

48 GB of RAM
16 GB of GPU memory

Installation

You need to install Singularity on your system.

$ singularity build --fakeroot Atlass.sif singularity.def

Usage

You can get usage information by the following command.

$ singularity run --nv Atlass.sif -h

Use of alias is recommended, for example, by adding the following to your ~/.bashrc.

alias Atlass='singularity run --nv Atlass.sif'

Select modes

AtLASS has three modes depending on combination of input files.

1. Leaning and Prediction mode

This mode is the standard.
RNA mapping is performed on the input genome by HISAT2, and models are trained using the annotated splice sites.
Finally, the splice sites of the input genome are predicted.
Command examples are as follows.

$ Atlass -g <genome> -1 <rna_1.fastq> -2 <rna_2.fastq> -c <cpu> -gpu
$ Atlass -g <genome> -b <bam> -gpu

2. Prediction only mode

This mode uses pre-trained models to predict the input genome.
Command example is as follows.

$ Atlass -g <genome> -m1 <model1> -m2 <model2> -gpu

3. Learning on the other species and Prediction mode

This mode trains models on the genome of other species and predicts the input genome.
Evidence for splice sites in other species genome can be provided by GenBank file, RNA-seq files, or BAM file.
Command examples are as follows.

$ Atlass -g <genome> -o-g <other_genome> -o-gbf <other_GenBank> -gpu
$ Atlass -g <genome> -o-g <other_genome> -o-1 <other_rna_1.fastq> -o-2 <other_rna_2.fastq> -c <cpu> -gpu
$ Atlass -g <genome> -o-g <other_genome> -o-b <other_bam> -gpu

Output

By default, Atlass_out_${year+month+day}/ directory is created under the current directory, and all output files are created in it.
The main output files to be created are as follows.

Atlass.tsv : Final prediction results in TSV format.
Each column shows:

Scaffold name
Position number of splice site
Intron start (forward) or Intron end (revcom)
Probability
The results of validation with RNA-seq mapping and one of four categories (mode 1 only): True Positive, False Negative, False Positive, and Atlass Prediction. AtLASS Prediction should be noted because it is a predicted result in genomic regions not covered by RNA-seq.

Scaffold	Position	Start/End	Probability	Category
NODE_1	124	forward	0.999	True Positive
NODE_1	211	revcom	0.998	True Positive
NODE_1	1288	revcom	0.922	False Positive
NODE_1	3286	revcom	0.998	AtLASS Prediction
NODE_1	22752	forward	N/A	False Negative

intron.tsv and exon.tsv : TSV files of intron and exon regions
model1/2_dir : The trained model state files are saved. These can be used in the mode 2 run.
filtered.bam : The BAM file after mapping and filtering, which can be used with the --BAM flag in the next run.
IntronTrain1/2.log : Logfile of each model training.
Std Error : The end time of each step is output. Example is follows.

  [2022-07-29 04:15:03]  AtLASS-0.0.1 mode:1 start.
  [2022-07-29 04:15:03]  Outdir is made at "/home/harada/AtLASS_test/Atlass_out_20220729".
  [2022-07-29 05:35:45]  IntronExtract Finish.
  [2022-07-29 05:35:46]  DataMake1 Finish.
  [2022-07-30 16:29:41]  IntronTrain Finish.
  [2022-07-31 22:30:44]  GenomePred Finish.
  [2022-07-31 23:13:58]  DataMake2 Finish.
  [2022-08-01 10:13:17]  IntronTrain Finish.
  [2022-08-02 16:15:28]  GenomePred Finish.
  [2022-08-02 22:38:21]  AtLASS-0.0.1 mode:1 Finish.

Flags

usage: Atlass [-h] [-o OUTDIR] [-g GENOME] [-b BAM] [-1 FASTQ_1] [-2 FASTQ_2] [-c CPU] [-gpu] [-f] [-m1 MODEL1]
              [-m2 MODEL2] [-o-g OTHER_GENOME] [-o-gbf OTHER_GENBANK] [-o-b OTHER_BAM] [-o-1 OTHER_FASTQ_1]
              [-o-2 OTHER_FASTQ_2] [-v]

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        Output directory 
                        Default is ./output_${year+month+day}
  -g GENOME, --genome GENOME
                        Genome fasta file to be predicted
  -b BAM, --bam BAM, --BAM BAM, --sam BAM, --SAM BAM
                        BAM or SAM file for learning 
                        HISAT2 is expected mapper
  -1 FASTQ_1, --fastq_1 FASTQ_1
                        Fastq or Fastq.gz file 1 for learning
  -2 FASTQ_2, --fastq_2 FASTQ_2
                        Fastq or Fastq.gz file 2 for learning
  -c CPU, --cpu CPU, --CPU CPU
                        Number of cpu in mapping phase 
                        Default is 1
  -gpu, --gpu, --GPU    Run with GPU device in learning and predicting phase 
                        Default is False, recommended is True 
                        CPU mode will use ALL CPUs
  -f, --fast, --FAST    Enable fast model 
                        Fast mode uses smaller models and may result in less accuracy
  -m1 MODEL1, --model1 MODEL1
                        Pre-trained model_1 file
  -m2 MODEL2, --model2 MODEL2
                        Pre-trained model_2 file
  -o-g OTHER_GENOME, --other_genome OTHER_GENOME
                        Genome fasta file of other species for learning
  -o-gbf OTHER_GENBANK, --other_GenBank OTHER_GENBANK
                        GenBank file of other species for learning
  -o-b OTHER_BAM, --other_bam OTHER_BAM
                        BAM or SAM file of other species for learning
  -o-1 OTHER_FASTQ_1, --other_fastq_1 OTHER_FASTQ_1
                        Fastq or Fastq.gz file 1 of other species for learning
  -o-2 OTHER_FASTQ_2, --other_fastq_2 OTHER_FASTQ_2
                        Fastq or Fastq.gz file 2 of other species for learning
  -v, --version         Show version and exit

License

See the LICENSE file for details.

Publication and Citation

If you find this tool useful, please cite our paper.

Ryo Harada, Keitaro Kume, Kazumasa Horie, Takuro Nakayama, Yuji Inagaki, Toshiyuki Amagasa, AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM, IPSJ Transactions on Bioinformatics, 2023, Volume 16, Pages 20-27, 10.2197/ipsjtbio.16.20

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scripts		scripts
Atlass		Atlass
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
singularity.def		singularity.def
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AtLASS

Hardware requirements

Installation

Usage

Select modes

1. Leaning and Prediction mode

2. Prediction only mode

3. Learning on the other species and Prediction mode

Output

Flags

License

Publication and Citation

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AtLASS

Hardware requirements

Installation

Usage

Select modes

1. Leaning and Prediction mode

2. Prediction only mode

3. Learning on the other species and Prediction mode

Output

Flags

License

Publication and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages