Variant Calling pipeline for Next Generation Sequence Data  A review

Ngeno K.

doi:10.31248/JASVM2018.092

Journal of Animal Science and Veterinary Medicine

JOURNAL OF ANIMAL SCIENCE AND VETERINARY MEDICINE
Integrity Research Journals

ISSN: 2536-7099
Model: Open Access/Peer Reviewed
DOI: 10.31248/JASVM
Start Year: 2016
Email: jasvm@integrityresjournals.org

Submit Manuscript

Variant Calling pipeline for Next Generation Sequence Data – A review

ABSTRACT

How to Cite Article

Full-Text PDF

Next generation sequencing (NGS) is of great significance for genetic improvement. Some of the most common application of NGS is the identification of the genomic variants, genes and sequence mutations. Mining of genomic variants such as single nucleotide polymorphisms (SNPs) from raw sequences involves several steps and use of numerous bioinformatics tools in a systematic manner. This paper reviews the components of a pipeline that calls SNPs from NGS data. The SNP calling pipeline includes base calling, quality checks, reads trimming, alignment of the quality reads to the reference genome, quality score recalibration, visualization and SNP identification. The final step of the pipeline is making biological sense out of the SNPs data, which involves filtering and annotation of the candidates SNPs.

REFERENCES

Abeel, T., Van, P. T., Saeys, Y., Galagan, J., & Van d. P. Y. (2012). GenomeView: a next-generation genome browser. Nucleic Acids Res, 40, 12. Crossref

Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., Kondrashov, A. S., & Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature methods, 7, 248. Crossref

Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.

Altmann, A., Weber, P., Bader, D., Preuß, M., Binder, E. B., & Müller-Myhsok, B. (2012). A beginner's guide to SNP calling from high-throughput DNA-sequencing data. Human genetics, 131, 1541-54. Crossref

Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X., & Ruden D. M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2), 80-92. Crossref

Cox, M., Peterson, D., & Biggs, P. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics, 11, 485. Crossref

David, M., Dzamba, M., Lister, D., Ilie, L., & Brudno, M. (2011). SHRiMP2: sensitive yet practical SHort Read Mapping. Bioinformatics, 27, 1011-1012. Crossref

DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., Philippakis A. A., Del, Angel, G., Rivas, M. A., & Hanna, M. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics, 43, 491-8. Crossref

Fiume, M., Williams, V., Brook, A., & Brudno, M. (2010). Savant: genome browser for high-throughput sequencing data. Bioinformatics, 26, 1938-44. Crossref

Garrison, E., & Marth, G. (2012). Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907.

Homer, N., & Nelson, S. (2010). Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA. Genome Biol., 11, 99. Crossref

Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., & McVean G. (2012). De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature genetics, 44, 226. Crossref

Joshi, N., & Fass J. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software].

Keim, P. S., & Wagner, D. M. (2009). Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases. Nat. Rev. Microbiol., 7, 813-821. Crossref

Koboldt, D. C., Chen, K., Wylie, T., Larson, D. E., McLellan, M. D., Mardis, E. R., Weinstock, G. M., Wilson, R. K., & Ding, L. (2009). VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics, 25, 2283-2285. Crossref

Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.,10, 25. Crossref

Li H., & Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-60. Crossref

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin R. (2009a). The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078-9. Crossref

Li, H., Ruan, J. & Durbin, R. (2008a). Mapping short DNA sequencing reads and calling variants using.

Li, H., Ruan, J. & Durbin, R. (2008b). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18, 1851-1858. Crossref

Li, R., Li, Y., Fang, X., Yang, H., Wang, J., & Kristianse, K. (2009b). SNP detection for massively parallel whole-genome resequencing. Genome Res., 19, 1124-1132. Crossref

Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., & Wang, J. (2009b). SNP detection for massively parallel whole-genome resequencing. Genome Research, 19, 1124-1132. Crossref

Li, R., Yu, C., Li, Y., Lam, T. W., Yiu, S. M., Kristiansen, K., & Wang, J. (2009c) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966-1967. Crossref

Martinez-Alcantara, A., Ballesteros, E., Feng, C., Rojas, M., Koshinsky, H., Fofanov, V., Havlak, P., & Fofanov, Y. (2009). PIQA: pipeline for Illumina G1 genome analyzer data quality assessment. Bioinformatics 25 (18), 2438-2439. Crossref

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. & DePristo, M. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297-1303. Crossref

McLaren, W., Pritchard, B., Rios, D., Chen, Y., Flicek, P., & Cunningham F. (2010). Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics, 26, 2069-2070. Crossref

Nelson, C. L., Pelak, K., Podgoreanu, M. V., Ahn, S. H., Scott, W. K., & Allen, A. S. (2014). A genome-wide association study of variants associated with acquisition of Staphylococcus aureus bacteremiaina healthcare setting. BMC Infect. Dis., 18, 83. Crossref

Nielsen, R., Paul, J. S., Albrechtsen, A., & Song, Y. S. (2011) Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics, 12, 443-451. Crossref

Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31, 3812-3814. Crossref

Olson, N. D., Lund, S. P., Colman, R. E., Foster, J. T., Sahl, J. W., Schupp, J. M., Keim, P., Morrow J. B., Salit, M. L., & Zook, J. M. (2015). Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Frontiers in genetics, 6, 235. Crossref

Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M. R., Zschocke, J., & Trajanoski, Z. (2014). A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics, 15, 256-78. Crossref

Pavlopoulos, G., Oulas, A., Iacucci, E., Sifrim, A., Moreau, Y., & Schneider, R. (2013). Unraveling genomic variation from next generation sequencing data. Bio Data Min. 6(1), 13. Crossref

Robinson, J., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E., Getz, G., & Mesirov, J. (2011). Integrative genomics viewer. Nat. Biotechnol., 29, 24-26. Crossref

Saa, P. A., & Nielsen, L. K. (2016). Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models. Bioinformatics, 32, 3807-3814. Crossref

Schmieder, R., & Edwards, R. (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics, 27, 863-864. Crossref

Sohyun, H., Eiru, K., Insuk, L., & Edward, M. M. (2015). Systematic comparison of variant calling pipelines using gold standard personal exome variants. Nature, 5, 17875. Torrent Variant Caller: Torrent Suite™ Software. Available at https://www.thermofisher.com

Suh, Y., & Vijg, J. (2005). SNP discovery in associating genetic variation with human disease phenotypes. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 573, 41-53. Crossref

Wang, K., Li, M., & Hakonarson, H. (2010). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res., 38, 164. Crossref

Xi, R., Kim, T., & Park, P. (2010). Detecting structural variations in the human genome using next generation sequencing. Brief Funct Genomics, 9, 405-415. Crossref

Xi, Y., Di, L., Fei, L., Jun, W., Jing, Z., Xue, X., Fangqing, Z., & Baoli, Z. (2013). HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics, 14, 33. Crossref

Zhang, T., Luo, Y., Liu K., Pan, L., Zhang, B., Yu, J., & Hu, S. (2011). BIGpre: a quality assessment package for next-generation sequencing data. Genomics Proteomics Bioinformatics, 9, 238-244. Crossref

Zook, J. M., Samarov, D., McDaniel, J., Sen, S. K., & Salit, M. (2012) Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing. PloS one 7, e41356. Crossref

For Authors

Payment

For Editors

Membership

For Reviewers

Terms

For Readers

Conferences

Journal of Animal Science and Veterinary Medicine

JOURNAL OF ANIMAL SCIENCE AND VETERINARY MEDICINEIntegrity Research Journals

Variant Calling pipeline for Next Generation Sequence Data – A review

ABSTRACT

REFERENCES

JOURNAL OF ANIMAL SCIENCE AND VETERINARY MEDICINE
Integrity Research Journals