List of gene prediction software
This is a list of software tools and web portals used for gene prediction.
Name | Description | Species | References |
---|---|---|---|
FINDER | Automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences | Eukaryotes | [1] |
FragGeneScan | Predicting genes in complete genomes and sequencing Reads | Prokaryotes, Metagenomes | [2] |
ATGpr | Identifies translational initiation sites in cDNA sequences | Human | [3] |
Prodigal | Its name stands for Prokaryotic Dynamic Programming Genefinding Algorithm. It is based on log-likelihood functions and does not use Hidden or Interpolated Markov Models. | Prokaryotes, Metagenomes (metaProdigal) | [4] |
AUGUSTUS | Eukaryote gene predictor | Eukaryotes | [5] |
BGF | Hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program | [6] | |
DIOGENES | Fast detection of coding regions in short genome sequences | ||
Dragon Promoter Finder | Program to recognize vertebrate RNA polymerase II promoters | Vertebrates | [7] |
EasyGene | The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome. | Prokaryotes | [8][9] |
EuGene | Integrative gene finding | Prokaryotes, Eukaryotes | [10][11] |
FGENESH | HMM-based gene structure prediction: multiple genes, both chains | Eukaryotes | [12] |
FrameD | Find genes and frameshift in G+C rich prokaryote sequences | Prokaryotes, Eukaryotes | [13] |
GeMoMa | Homology-based gene prediction based on amino acid and intron position conservation as well as RNA-Seq data | [14][15] | |
GENIUS II | Links ORFs in complete genomes to protein 3D structures | Prokaryotes, Eukaryotes | [16] |
geneid | Program to predict genes, exons, splice sites, and other signals along DNA sequences | Eukaryotes | [17] |
GeneParser | Parse DNA sequences into introns and exons | Eukaryotes | [18] |
GeneMark | Family of self-training gene prediction programs | Prokaryotes, Eukaryotes, Metagenomes | [19][20][21][22] |
GeneTack | Predicts genes with frameshifts in prokaryote genomes | Prokaryotes | [23] |
GenomeScan | Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms, GENSCAN server is the GenomeScan's predecessor | Vertebrate, Arabidopsis, Maize | [24] |
GENSCAN | Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms | Vertebrate, Arabidopsis, Maize | [25][26][27] |
GLIMMER | Finds genes in microbial DNA | Prokaryotes | [28][29][30] |
GLIMMERHMM | Eukaryotic gene-finding system | Eukaryotes | [31] |
GrailEXP | Predicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repeat elements in DNA sequence | Human, Mus musculus, Arabidopsis thaliana, Drosophila melanogaster | [32][33] |
mGene | Support-vector machine (SVM) based system to find genes | Eukaryotes | [34] |
mGene.ngs | SVM based system to find genes using heterogeneous information: RNA-seq, tiling arrays | Eukaryotes | [35] |
MORGAN | Decision tree system to find genes in vertebrate DNA | Eukaryotes | [36] |
BioNIX | Web tool to combine results from different programs: GRAIL, FEX, HEXON, MZEF, GENEMARK, GENEFINDER, FGENE, BLAST, POLYAH, REPEATMASKER, TRNASCAN | Prokaryotes, Eukaryotes | [37] |
NNPP | Neural network promoter prediction | Prokaryotes, Eukaryotes | [38] |
NNSPLICE | Neural network splice site prediction | Drosophila, Human | [39] |
ORFfinder | Graphical analysis tool to find all open reading frames | Prokaryotes, Eukaryotes | [40] |
Regulatory Sequence Analysis Tools | Series of modular computer programs to detect regulatory signals in non-coding sequences | Fungi, Prokaryotes, Metazoa, Protist, Plants | [41][42] |
PHANOTATE | A tool to annotate phage genomes. | Phages | [43] |
SplicePredictor | Method to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical models | Eukaryotes | [44] |
VEIL | Hidden Markov model to find genes in vertebrate DNA Server | Eukaryotes | [45] |
See also
[edit]- Gene prediction
- List of RNA structure prediction software
- Comparison of software for molecular mechanics modeling
References
[edit]- ^ Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (Apr 2021). "FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences". BMC Bioinformatics. 44 (9): e89. doi:10.1186/s12859-021-04120-9. PMC 8056616. PMID 33879057.
- ^ Rho M, Tang H, Ye Y (November 2010). "FragGeneScan: predicting genes in short and error-prone reads". Nucleic Acids Research. 38 (20): e191. doi:10.1093/nar/gkq747. PMC 2978382. PMID 20805240.
- ^ Nishikawa, Tetsuo; Ota, Toshio; Isogai, Takao (2000-11-01). "Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences". Bioinformatics. 16 (11): 960–967. doi:10.1093/bioinformatics/16.11.960. ISSN 1367-4803. PMID 11159307.
- ^ Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (March 2010). "Prodigal: prokaryotic gene recognition and translation initiation site identification". BMC Bioinformatics. 11: 119. doi:10.1186/1471-2105-11-119. PMC 2848648. PMID 20211023.
- ^ Keller O, Kollmar M, Stanke M, Waack S (March 2011). "A novel hybrid gene prediction method employing protein multiple sequence alignments". Bioinformatics. 27 (6): 757–63. doi:10.1093/bioinformatics/btr010. hdl:11858/00-001M-0000-0011-F244-D. PMID 21216780.
- ^ Li, Heng; Liu, Jin-Song; Xu, Zhao; Jin, Jiao; Fang, Lin; Gao, Lei; Li, Yu-Dong; Xing, Zi-Xing; Gao, Shao-Gen; Liu, Tao; Li, Hai-Hong (2005-07-01). "Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome". Journal of Computer Science and Technology. 20 (4): 446–453. doi:10.1007/s11390-005-0446-x. ISSN 1860-4749. S2CID 13497894.
- ^ Bajic, Vladimir B.; Seah, Seng Hong; Chong, Allen; Zhang, Guanglan; Koh, Judice L. Y.; Brusic, Vladimir (2002-01-01). "Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters". Bioinformatics. 18 (1): 198–199. doi:10.1093/bioinformatics/18.1.198. ISSN 1367-4803. PMID 11836231.
- ^ Nielsen, P.; Krogh, A. (2005-12-15). "Large-scale prokaryotic gene prediction and comparison to genome annotation". Bioinformatics. 21 (24): 4322–4329. doi:10.1093/bioinformatics/bti701. ISSN 1367-4803. PMID 16249266.
- ^ Larsen, Thomas Schou; Krogh, Anders (2003-06-03). "EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance". BMC Bioinformatics. 4 (1): 21. doi:10.1186/1471-2105-4-21. ISSN 1471-2105. PMC 521197. PMID 12783628.
- ^ Foissac S, Gouzy J, Rombauts S, Mathé C, Amselem J, Sterck L, de Peer YV, Rouzé P, Schiex T (May 2008). "Genome annotation in plants and fungi: EuGene as a model platform". Current Bioinformatics. 3 (2): 87–97. doi:10.2174/157489308784340702.
- ^ Sallet, Erika; Gouzy, Jérôme; Schiex, Thomas (2019), Kollmar, Martin (ed.), "EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes", Gene Prediction: Methods and Protocols, Methods in Molecular Biology, vol. 1962, New York, NY: Springer, pp. 97–120, doi:10.1007/978-1-4939-9173-0_6, ISBN 978-1-4939-9173-0, PMID 31020556, S2CID 131776381, retrieved 2021-11-24
- ^ Salamov AA, Solovyev VV (April 2000). "Ab initio gene finding in Drosophila genomic DNA". Genome Research. 10 (4): 516–22. doi:10.1101/gr.10.4.516. PMC 310882. PMID 10779491.
- ^ Schiex T, Gouzy J, Moisan A, de Oliveira Y (July 2003). "FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences". Nucleic Acids Research. 31 (13): 3738–41. doi:10.1093/nar/gkg610. PMC 169016. PMID 12824407.
- ^ Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F (May 2016). "Using intron position conservation for homology-based gene prediction". Nucleic Acids Research. 44 (9): e89. doi:10.1186/s12859-018-2203-5. PMC 4872089. PMID 26893356.
- ^ Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J (May 2018). "Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi". BMC Bioinformatics. 19 (1): 189. doi:10.1093/nar/gkw092. PMC 5975413. PMID 29843602.
- ^ Yabuki, Yukimitsu; Mukai, Yuri; Swindells, Mark B.; Suwa, Makiko (2004-03-01). "GENIUS II: a high-throughput database system for linking ORFs in complete genomes to known protein three-dimensional structures". Bioinformatics. 20 (4): 596–598. doi:10.1093/bioinformatics/btg478. ISSN 1367-4803. PMID 14751990.
- ^ Blanco, Enrique; Parra, Genís; Guigó, Roderic (June 2007), "Using geneid to Identify Genes", Current Protocols in Bioinformatics, Chapter 4, John Wiley & Sons, Inc.: 4.3.1–4.3.28, doi:10.1002/0471250953.bi0403s18, ISBN 978-0471250951, PMID 18428791
- ^ Snyder, Eric E.; Stormo, Gary D. (1995-04-21). "Identification of Protein Coding Regions In Genomic DNA". Journal of Molecular Biology. 248 (1): 1–18. doi:10.1006/jmbi.1995.0198. ISSN 0022-2836. PMID 7731036.
- ^ Lukashin AV, Borodovsky M (February 1998). "GeneMark.hmm: new solutions for gene finding". Nucleic Acids Research. 26 (4): 1107–15. doi:10.1093/nar/26.4.1107. PMC 147337. PMID 9461475.
- ^ Besemer J, Lomsadze A, Borodovsky M (June 2001). "GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions". Nucleic Acids Research. 29 (12): 2607–18. doi:10.1093/nar/29.12.2607. PMC 55746. PMID 11410670.
- ^ Lomsadze A, Burns PD, Borodovsky M (September 2014). "Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm". Nucleic Acids Research. 42 (15): e119. doi:10.1093/nar/gku557. PMC 4150757. PMID 24990371.
- ^ Zhu W, Lomsadze A, Borodovsky M (July 2010). "Ab initio gene identification in metagenomic sequences". Nucleic Acids Research. 38 (12): e132. doi:10.1093/nar/gkq275. PMC 2896542. PMID 20403810.
- ^ Antonov I, Borodovsky M (June 2010). "Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm". Journal of Bioinformatics and Computational Biology. 8 (3): 535–51. doi:10.1142/S0219720010004847. PMID 20556861.
- ^ Yeh, Ru-Fang; Lim, Lee P.; Burge, Christopher B. (2001-05-01). "Computational Inference of Homologous Gene Structures in the Human Genome". Genome Research. 11 (5): 803–816. doi:10.1101/gr.175701. ISSN 1088-9051. PMC 311055. PMID 11337476.
- ^ Burge, Chris; Karlin, Samuel (1997-04-25). "Prediction of complete gene structures in human genomic DNA11Edited by F. E. Cohen". Journal of Molecular Biology. 268 (1): 78–94. doi:10.1006/jmbi.1997.0951. ISSN 0022-2836. PMID 9149143.
- ^ Burge, Christopher B. (1998-01-01), Salzberg, Steven L.; Searls, David B.; Kasif, Simon (eds.), "Chapter 8 - Modeling dependencies in pre-mRNA splicing signals", New Comprehensive Biochemistry, Computational Methods in Molecular Biology, vol. 32, Elsevier, pp. 129–164, doi:10.1016/S0167-7306(08)60465-2, ISBN 978-0-444-82875-0, retrieved 2021-11-24
- ^ Burge, Christopher B; Karlin, Samuel (1998-06-01). "Finding the genes in genomic DNA". Current Opinion in Structural Biology. 8 (3): 346–354. doi:10.1016/S0959-440X(98)80069-9. ISSN 0959-440X. PMID 9666331.
- ^ Delcher, Arthur L.; Bratke, Kirsten A.; Powers, Edwin C.; Salzberg, Steven L. (2007-01-19). "Identifying bacterial genes and endosymbiont DNA with Glimmer". Bioinformatics. 23 (6): 673–679. doi:10.1093/bioinformatics/btm009. ISSN 1460-2059. PMC 2387122. PMID 17237039.
- ^ Delcher, A. (1999-12-01). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research. 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. ISSN 1362-4962. PMC 148753. PMID 10556321.
- ^ Salzberg, S. L.; Delcher, A. L.; Kasif, S.; White, O. (1998-01-01). "Microbial gene identification using interpolated Markov models". Nucleic Acids Research. 26 (2): 544–548. doi:10.1093/nar/26.2.544. ISSN 0305-1048. PMC 147303. PMID 9421513.
- ^ Majoros WH, Pertea M, Salzberg SL (November 2004). "TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders". Bioinformatics. 20 (16): 2878–9. doi:10.1093/bioinformatics/bth315. PMID 15145805.
- ^ Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2004). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Bioinformatics. 8 (1): 4.9.1–4.9.15. doi:10.1002/0471250953.bi0409s04. ISSN 1934-340X. PMID 18428726.
- ^ Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2003). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Human Genetics. 39 (1): 6.5.1–6.5.15. doi:10.1002/0471142905.hg0605s39. ISSN 1934-8258. PMID 18428363. S2CID 21431978.
- ^ Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, et al. (November 2009). "mGene: accurate SVM-based gene finding with an application to nematode genomes". Genome Research. 19 (11): 2133–43. doi:10.1101/gr.090597.108. PMC 2775605. PMID 19564452.
- ^ Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, et al. (August 2011). "Multiple reference genomes and transcriptomes for Arabidopsis thaliana". Nature. 477 (7365): 419–23. Bibcode:2011Natur.477..419G. doi:10.1038/nature10414. PMC 4856438. PMID 21874022.
- ^ "MORGAN". sites.stat.washington.edu. Retrieved 2021-11-24.
- ^ Bedő, Justin; Di Stefano, Leon; Papenfuss, Anthony T (November 2020). "Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix". GigaScience. 9 (11). doi:10.1093/gigascience/giaa121. ISSN 2047-217X. PMC 7672450. PMID 33205815.
- ^ Reese, Martin G (2001-12-01). "Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome". Computers & Chemistry. 26 (1): 51–56. doi:10.1016/S0097-8485(01)00099-7. ISSN 0097-8485. PMID 11765852.
- ^ Reese, Martin G.; Eeckman, Frank H.; Kulp, David; Haussler, David (1997-01-01). "Improved Splice Site Detection in Genie". Journal of Computational Biology. 4 (3): 311–323. doi:10.1089/cmb.1997.4.311. PMID 9278062.
- ^ "Home - ORFfinder - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2021-11-24.
- ^ Santana-Garcia, Walter; Rocha-Acevedo, Maria; Ramirez-Navarro, Lucia; Mbouamboua, Yvon; Thieffry, Denis; Thomas-Chollier, Morgane; Contreras-Moreira, Bruno; van Helden, Jacques; Medina-Rivera, Alejandra (2019-01-01). "RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding". Computational and Structural Biotechnology Journal. 17: 1415–1428. doi:10.1016/j.csbj.2019.09.009. ISSN 2001-0370. PMC 6906655. PMID 31871587.
- ^ Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques (2018-05-02). "RSAT 2018: regulatory sequence analysis tools 20th anniversary". Nucleic Acids Research. 46 (W1): W209–W214. doi:10.1093/nar/gky317. ISSN 0305-1048. PMC 6030903. PMID 29722874.
- ^ McNair, Katelyn; Zhou, Carol; Dinsdale, Elizabeth A.; Souza, Brian; Edwards, Robert A. (2019-11-01). "PHANOTATE: a novel approach to gene identification in phage genomes". Bioinformatics. 35 (22): 4537–4542. doi:10.1093/bioinformatics/btz265. ISSN 1367-4803. PMC 6853651. PMID 31329826.
- ^ Brendel, V.; Xing, L.; Zhu, W. (2004-02-05). "Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus". Bioinformatics. 20 (7): 1157–1169. doi:10.1093/bioinformatics/bth058. ISSN 1367-4803. PMID 14764557.
- ^ Henderson, John; Salzberg, Steven; Fasman, Kenneth H. (1997-01-01). "Finding Genes in DNA with a Hidden Markov Model". Journal of Computational Biology. 4 (2): 127–141. doi:10.1089/cmb.1997.4.127. hdl:1903/8004. PMID 9228612.