JOURNAL OF ANIMAL SCIENCE AND VETERINARY MEDICINE
Integrity Research Journals

ISSN: 2536-7099
Model: Open Access/Peer Reviewed
DOI: 10.31248/JASVM
Start Year: 2016
Email: jasvm@integrityresjournals.org


Variant Calling pipeline for Next Generation Sequence Data – A review

https://doi.org/10.31248/JASVM2018.092   |   Article Number: 49FBD3E01   |   Vol.3 (4) - August 2018

Received Date: 20 February 2018   |   Accepted Date: 30 April 2018  |   Published Date: 30 August 2018

Author:  Ngeno K.

Keywords: SNP., Next-generation sequencing, Annotation

Next generation sequencing (NGS) is of great significance for genetic improvement. Some of the most common application of NGS is the identification of the genomic variants, genes and sequence mutations. Mining of genomic variants such as single nucleotide polymorphisms (SNPs) from raw sequences involves several steps and use of numerous bioinformatics tools in a systematic manner. This paper reviews the components of a pipeline that calls SNPs from NGS data. The SNP calling pipeline includes base calling, quality checks, reads trimming, alignment of the quality reads to the reference genome, quality score recalibration, visualization and SNP identification. The final step of the pipeline is making biological sense out of the SNPs data, which involves filtering and annotation of the candidates SNPs.

Abeel, T., Van, P. T., Saeys, Y., Galagan, J., & Van d. P. Y. (2012). GenomeView: a next-generation genome browser. Nucleic Acids Res, 40, 12.
Crossref
 
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., Kondrashov, A. S., & Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature methods, 7, 248.
Crossref
 
Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.
 
Altmann, A., Weber, P., Bader, D., Preuß, M., Binder, E. B., & Müller-Myhsok, B. (2012). A beginner's guide to SNP calling from high-throughput DNA-sequencing data. Human genetics, 131, 1541-54.
Crossref
 
Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X., & Ruden D. M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2), 80-92.
Crossref
 
Cox, M., Peterson, D., & Biggs, P. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics, 11, 485.
Crossref
 
David, M., Dzamba, M., Lister, D., Ilie, L., & Brudno, M. (2011). SHRiMP2: sensitive yet practical SHort Read Mapping. Bioinformatics, 27, 1011-1012.
Crossref
 
DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., Philippakis A. A., Del, Angel, G., Rivas, M. A., & Hanna, M. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics, 43, 491-8.
Crossref
 
Fiume, M., Williams, V., Brook, A., & Brudno, M. (2010). Savant: genome browser for high-throughput sequencing data. Bioinformatics, 26, 1938-44.
Crossref
 
Garrison, E., & Marth, G. (2012). Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907.
 
Homer, N., & Nelson, S. (2010). Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA. Genome Biol., 11, 99.
Crossref
 
Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., & McVean G. (2012). De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature genetics, 44, 226.
Crossref
 
Joshi, N., & Fass J. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software].
 
Keim, P. S., & Wagner, D. M. (2009). Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases. Nat. Rev. Microbiol., 7, 813-821.
Crossref
 
Koboldt, D. C., Chen, K., Wylie, T., Larson, D. E., McLellan, M. D., Mardis, E. R., Weinstock, G. M., Wilson, R. K., & Ding, L. (2009). VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics, 25, 2283-2285.
Crossref
 
Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.,10, 25.
Crossref
 
Li H., & Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-60.
Crossref
 
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin R. (2009a). The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078-9.
Crossref
 
Li, H., Ruan, J. & Durbin, R. (2008a). Mapping short DNA sequencing reads and calling variants using.
 
Li, H., Ruan, J. & Durbin, R. (2008b). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18, 1851-1858.
Crossref
 
Li, R., Li, Y., Fang, X., Yang, H., Wang, J., & Kristianse, K. (2009b). SNP detection for massively parallel whole-genome resequencing. Genome Res., 19, 1124-1132.
Crossref
 
Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., & Wang, J. (2009b). SNP detection for massively parallel whole-genome resequencing. Genome Research, 19, 1124-1132.
Crossref
 
Li, R., Yu, C., Li, Y., Lam, T. W., Yiu, S. M., Kristiansen, K., & Wang, J. (2009c) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966-1967.
Crossref
 
Martinez-Alcantara, A., Ballesteros, E., Feng, C., Rojas, M., Koshinsky, H., Fofanov, V., Havlak, P., & Fofanov, Y. (2009). PIQA: pipeline for Illumina G1 genome analyzer data quality assessment. Bioinformatics 25 (18), 2438-2439.
Crossref
 
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. & DePristo, M. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297-1303.
Crossref
 
McLaren, W., Pritchard, B., Rios, D., Chen, Y., Flicek, P., & Cunningham F. (2010). Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics, 26, 2069-2070.
Crossref
 
Nelson, C. L., Pelak, K., Podgoreanu, M. V., Ahn, S. H., Scott, W. K., & Allen, A. S. (2014). A genome-wide association study of variants associated with acquisition of Staphylococcus aureus bacteremiaina healthcare setting. BMC Infect. Dis., 18, 83.
Crossref
 
Nielsen, R., Paul, J. S., Albrechtsen, A., & Song, Y. S. (2011) Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics, 12, 443-451.
Crossref
 
Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31, 3812-3814.
Crossref
 
Olson, N. D., Lund, S. P., Colman, R. E., Foster, J. T., Sahl, J. W., Schupp, J. M., Keim, P., Morrow J. B., Salit, M. L., & Zook, J. M. (2015). Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Frontiers in genetics, 6, 235.
Crossref
 
Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M. R., Zschocke, J., & Trajanoski, Z. (2014). A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics, 15, 256-78.
Crossref
 
Pavlopoulos, G., Oulas, A., Iacucci, E., Sifrim, A., Moreau, Y., & Schneider, R. (2013). Unraveling genomic variation from next generation sequencing data. Bio Data Min. 6(1), 13.
Crossref
 
Robinson, J., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E., Getz, G., & Mesirov, J. (2011). Integrative genomics viewer. Nat. Biotechnol., 29, 24-26.
Crossref
 
Saa, P. A., & Nielsen, L. K. (2016). Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models. Bioinformatics, 32, 3807-3814.
Crossref
 
Schmieder, R., & Edwards, R. (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics, 27, 863-864.
Crossref
 
Sohyun, H., Eiru, K., Insuk, L., & Edward, M. M. (2015). Systematic comparison of variant calling pipelines using gold standard personal exome variants. Nature, 5, 17875. Torrent Variant Caller: Torrent Suite™ Software. Available at https://www.thermofisher.com
 
Suh, Y., & Vijg, J. (2005). SNP discovery in associating genetic variation with human disease phenotypes. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 573, 41-53.
Crossref
 
Wang, K., Li, M., & Hakonarson, H. (2010). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res., 38, 164.
Crossref
 
Xi, R., Kim, T., & Park, P. (2010). Detecting structural variations in the human genome using next generation sequencing. Brief Funct Genomics, 9, 405-415.
Crossref
 
Xi, Y., Di, L., Fei, L., Jun, W., Jing, Z., Xue, X., Fangqing, Z., & Baoli, Z. (2013). HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics, 14, 33.
Crossref
 
Zhang, T., Luo, Y., Liu K., Pan, L., Zhang, B., Yu, J., & Hu, S. (2011). BIGpre: a quality assessment package for next-generation sequencing data. Genomics Proteomics Bioinformatics, 9, 238-244.
Crossref
 
Zook, J. M., Samarov, D., McDaniel, J., Sen, S. K., & Salit, M. (2012) Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing. PloS one 7, e41356.
Crossref