SnpEff

From Wikipedia the free encyclopedia

SnpEff
Original author(s)Pablo Cingolani
Initial release2012
Stable release
5.2c / April 9, 2024; 7 months ago (2024-04-09)
Repositorygithub.com/pcingola/SnpEff
Written inJava
LicenseMIT
Websitepcingola.github.io/SnpEff/

SnpEff is an open source tool that performs annotation on genetic variants and predicts their effects on genes by using an interval forest approach. This program takes pre-determined variants listed in a data file that contains the nucleotide change and its position and predicts if the variants are deleterious. This program was first developed to predict effects of single nucleotide polymorphisms (SNPs) in Drosophila.[1] As of July 2024, this SnpEff paper has been cited 10076 times. SnpEff has been used for various applications[2][3][4] – from personalized medicine,[5] to profiling bacteria.[6] This annotation and prediction software can be compared to ANNOVAR and Variant Effect Predictor, but each use different nomenclatures.[7][8]

Usage pathway for SnpEff

Usage

[edit]

SnpEff has the capability to work on Windows, Unix or Mac systems, although the installation steps differ. For all systems, SnpEff is first downloaded as a ZIP file, decompressed [9] and then copy-pasted into the desired software (Windows) or requires an additional command line (Unix and Mac). Once the software is installed, the user inputs a VCF or TXT file into the tool kit that contains the tab-separated columns: Chromosome name, Position, Variant’s ID, Reference genome, Alternative, Quality score, Quality filter and Information.

SnpEff Input File Example

The chromosome name and position columns describe where the variant is located – chromosome number and nucleotide position. If the variant has a previously determined name (example: rs34567), it goes in the ID column. The reference column provides the specific nucleotide in the reference genome – differentiations from the reference are noted in the Alternative section. How accurate the variant is will be the Quality column and its readout from Quality filters are included in the filter column. Any other genomic information is put in the INFO column, which is altered to display the output after running SnpEff.

SnpEff Output Example

The output in the INFO section includes: the effect of the variant (stop loss, stop gain, etc.), effect impact on gene (High, Moderate, Low or Modifier), functional class of the variant (nonsense, missense, frameshift etc.), codon change, amino acid change, amino acid length, gene name, gene biotype (protein coding, pseudogene, rRNA, etc.[10]), coding information, transcript information, exon information and any errors or warnings detected. The Effect impact is what SnpEff uses to determine how deleterious the variant is on genes. For example, a HIGH impact output means that SnpEff predicts that the variant causes deleterious gene effects.

SnpEff is typically used for research and academic purposes at institutions and companies - and in some instances, personalized medicine.[citation needed] However, Pablo Cingolani now recommends that ClinEff (a combination of SnpEff and SnpSift) be used for medical purposes.[citation needed]

Advantages and limitations

[edit]

SnpEff contains many advantages and limitations. It is able to analyze all variants from the 1000 Genome Project in less than 15 minutes and can be integrated into other tools such as Galaxy, GATK and GKNO. It can be combined with other toolkits to narrow variant prediction parameters (example: whitelist [11]).

  • Lists how the variant was classified
  • The 5 kb upstream/downstream reading frame provides for a more thorough analysis of upstream/downstream regions (the 1kb in ANNOVAR could miss important regulating regions)

SnpEff Limitations:

  • False positives
  • Results vary from one prediction tool to another
  • Does not provide the best explanation of effect - example: sometimes lists frameshift instead of stop loss
  • The 5 kb upstream/downstream reading frame may mistake noncoding regions to be regulation points

See also

[edit]

References

[edit]
  1. ^ "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.", Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. Fly (Austin). 2012 Apr-Jun;6(2):80-92. PMID 22728672 [PubMed - in process]
  2. ^ Medina, Ignacio, et al. "VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing." Nucleic Acids Research 40.W1 (2012): W54-W58.
  3. ^ Kim, Yun Joong, et al. "Neuroimaging studies and whole exome sequencing of PLA2G6-associated neurodegeneration in a family with intrafamilial phenotypic heterogeneity." Parkinsonism & related disorders 21.4 (2015): 402-406.
  4. ^ Reddy, Mettu M., and Kandasamy Ulaganathan. "Draft genome sequence of Oryza sativa elite indica cultivar RP Bio-226." Frontiers in plant science 6 (2015).
  5. ^ Dewey, Frederick E., et al. "Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study." Science 354.6319 (2016): aaf6814.
  6. ^ Medvedeva, E. S., et al. "Genomic and proteomic profiles of Acholeplasma laidlawii strains differing in sensitivity to ciprofloxacin." Doklady Biochemistry and Biophysics. Vol. 466. No. 1. Pleiades Publishing, 2016.
  7. ^ Wang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data Nucleic Acids Research, 38:e164, 2010
  8. ^ "Variant Effect Predictor." Variant Effect Predictor. EMBL-EBI, Dec. 2016. Web. 28 Feb. 2017. <http://uswest.ensembl.org/info/docs/tools/vep/index.html>.
  9. ^ "SnpEff." SnpEff. N.p., n.d. Web. 28 Feb. 2017. <http://snpeff.sourceforge.net/SnpEff_manual.html>.
  10. ^ "Help - Frequently Asked Questions - Homo sapiens - Ensembl genome browser 87." Help - Frequently Asked Questions - Homo sapiens - Ensembl genome browser 87. N.p., n.d. Web. 28 Feb. 2017.
  11. ^ Dewey, Frederick E., et al. "Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study." Science 354.6319 (2016): aaf6814.
[edit]