ISSN: 2536-7099
Model: Open Access/Peer Reviewed
DOI: 10.31248/JASVM
Start Year: 2016
Email: jasvm@integrityresjournals.org
https://doi.org/10.31248/JASVM2018.092 | Article Number: 49FBD3E01 | Vol.3 (4) - August 2018
Received Date: 20 February 2018 | Accepted Date: 30 April 2018 | Published Date: 30 August 2018
Author: Ngeno K.
Keywords: SNP., Next-generation sequencing, Annotation
Next generation sequencing (NGS) is of great significance for genetic improvement. Some of the most common application of NGS is the identification of the genomic variants, genes and sequence mutations. Mining of genomic variants such as single nucleotide polymorphisms (SNPs) from raw sequences involves several steps and use of numerous bioinformatics tools in a systematic manner. This paper reviews the components of a pipeline that calls SNPs from NGS data. The SNP calling pipeline includes base calling, quality checks, reads trimming, alignment of the quality reads to the reference genome, quality score recalibration, visualization and SNP identification. The final step of the pipeline is making biological sense out of the SNPs data, which involves filtering and annotation of the candidates SNPs.
Abeel, T., Van, P. T., Saeys, Y., Galagan, J., & Van d. P. Y. (2012). GenomeView: a next-generation genome browser. Nucleic Acids Res, 40, 12. Crossref |
||||
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., Kondrashov, A. S., & Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature methods, 7, 248. Crossref |
||||
Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. | ||||
Altmann, A., Weber, P., Bader, D., Preuß, M., Binder, E. B., & Müller-Myhsok, B. (2012). A beginner's guide to SNP calling from high-throughput DNA-sequencing data. Human genetics, 131, 1541-54. Crossref |
||||
Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X., & Ruden D. M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2), 80-92. Crossref |
||||
Cox, M., Peterson, D., & Biggs, P. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics, 11, 485. Crossref |
||||
David, M., Dzamba, M., Lister, D., Ilie, L., & Brudno, M. (2011). SHRiMP2: sensitive yet practical SHort Read Mapping. Bioinformatics, 27, 1011-1012. Crossref |
||||
DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., Philippakis A. A., Del, Angel, G., Rivas, M. A., & Hanna, M. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics, 43, 491-8. Crossref |
||||
Fiume, M., Williams, V., Brook, A., & Brudno, M. (2010). Savant: genome browser for high-throughput sequencing data. Bioinformatics, 26, 1938-44. Crossref |
||||
Garrison, E., & Marth, G. (2012). Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907. | ||||
Homer, N., & Nelson, S. (2010). Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA. Genome Biol., 11, 99. Crossref |
||||
Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., & McVean G. (2012). De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature genetics, 44, 226. Crossref |
||||
Joshi, N., & Fass J. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. | ||||
Keim, P. S., & Wagner, D. M. (2009). Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases. Nat. Rev. Microbiol., 7, 813-821. Crossref |
||||
Koboldt, D. C., Chen, K., Wylie, T., Larson, D. E., McLellan, M. D., Mardis, E. R., Weinstock, G. M., Wilson, R. K., & Ding, L. (2009). VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics, 25, 2283-2285. Crossref |
||||
Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.,10, 25. Crossref |
||||
Li H., & Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-60. Crossref |
||||
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin R. (2009a). The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078-9. Crossref |
||||
Li, H., Ruan, J. & Durbin, R. (2008a). Mapping short DNA sequencing reads and calling variants using. | ||||
Li, H., Ruan, J. & Durbin, R. (2008b). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18, 1851-1858. Crossref |
||||
Li, R., Li, Y., Fang, X., Yang, H., Wang, J., & Kristianse, K. (2009b). SNP detection for massively parallel whole-genome resequencing. Genome Res., 19, 1124-1132. Crossref |
||||
Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., & Wang, J. (2009b). SNP detection for massively parallel whole-genome resequencing. Genome Research, 19, 1124-1132. Crossref |
||||
Li, R., Yu, C., Li, Y., Lam, T. W., Yiu, S. M., Kristiansen, K., & Wang, J. (2009c) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966-1967. Crossref |
||||
Martinez-Alcantara, A., Ballesteros, E., Feng, C., Rojas, M., Koshinsky, H., Fofanov, V., Havlak, P., & Fofanov, Y. (2009). PIQA: pipeline for Illumina G1 genome analyzer data quality assessment. Bioinformatics 25 (18), 2438-2439. Crossref |
||||
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. & DePristo, M. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297-1303. Crossref |
||||
McLaren, W., Pritchard, B., Rios, D., Chen, Y., Flicek, P., & Cunningham F. (2010). Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics, 26, 2069-2070. Crossref |
||||
Nelson, C. L., Pelak, K., Podgoreanu, M. V., Ahn, S. H., Scott, W. K., & Allen, A. S. (2014). A genome-wide association study of variants associated with acquisition of Staphylococcus aureus bacteremiaina healthcare setting. BMC Infect. Dis., 18, 83. Crossref |
||||
Nielsen, R., Paul, J. S., Albrechtsen, A., & Song, Y. S. (2011) Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics, 12, 443-451. Crossref |
||||
Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31, 3812-3814. Crossref |
||||
Olson, N. D., Lund, S. P., Colman, R. E., Foster, J. T., Sahl, J. W., Schupp, J. M., Keim, P., Morrow J. B., Salit, M. L., & Zook, J. M. (2015). Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Frontiers in genetics, 6, 235. Crossref |
||||
Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M. R., Zschocke, J., & Trajanoski, Z. (2014). A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics, 15, 256-78. Crossref |
||||
Pavlopoulos, G., Oulas, A., Iacucci, E., Sifrim, A., Moreau, Y., & Schneider, R. (2013). Unraveling genomic variation from next generation sequencing data. Bio Data Min. 6(1), 13. Crossref |
||||
Robinson, J., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E., Getz, G., & Mesirov, J. (2011). Integrative genomics viewer. Nat. Biotechnol., 29, 24-26. Crossref |
||||
Saa, P. A., & Nielsen, L. K. (2016). Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models. Bioinformatics, 32, 3807-3814. Crossref |
||||
Schmieder, R., & Edwards, R. (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics, 27, 863-864. Crossref |
||||
Sohyun, H., Eiru, K., Insuk, L., & Edward, M. M. (2015). Systematic comparison of variant calling pipelines using gold standard personal exome variants. Nature, 5, 17875. Torrent Variant Caller: Torrent Suite™ Software. Available at https://www.thermofisher.com | ||||
Suh, Y., & Vijg, J. (2005). SNP discovery in associating genetic variation with human disease phenotypes. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 573, 41-53. Crossref |
||||
Wang, K., Li, M., & Hakonarson, H. (2010). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res., 38, 164. Crossref |
||||
Xi, R., Kim, T., & Park, P. (2010). Detecting structural variations in the human genome using next generation sequencing. Brief Funct Genomics, 9, 405-415. Crossref |
||||
Xi, Y., Di, L., Fei, L., Jun, W., Jing, Z., Xue, X., Fangqing, Z., & Baoli, Z. (2013). HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics, 14, 33. Crossref |
||||
Zhang, T., Luo, Y., Liu K., Pan, L., Zhang, B., Yu, J., & Hu, S. (2011). BIGpre: a quality assessment package for next-generation sequencing data. Genomics Proteomics Bioinformatics, 9, 238-244. Crossref |
||||
Zook, J. M., Samarov, D., McDaniel, J., Sen, S. K., & Salit, M. (2012) Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing. PloS one 7, e41356. Crossref |