University of Veterinary Medicine Vienna - Research portal

Diagrammed Link to Homepage University of Veterinary Medicine, Vienna

Selected Publication:

Open Access Logo

Type of publication: Diploma Thesis
Type of document:

Year: 2009

Authors: Mann, Evelyne

Title: Validation of ‘Massively Parallel Sequencing’ data from Drosophila mauritiana using classic Sanger sequencing.

Source: Diplomarbeit, Vet. Med. Univ. Wien, pp. 61.


Authors Vetmeduni Vienna:

Selberherr Evelyne

Advisor(s):
Schlötterer Christian

Reviewer(s):
Müller Mathias

Vetmed Research Units:
Institute of Population Genetics


Abstract:
Massively parallel sequencing technology (MPST) has a wide range of applications. Its use in SNP (single nucleotide polymorphisms) detection is already widespread and promises results of high accuracy. The aim of this thesis is to validate data generated with the Genome Analyser II (GAII). I am going to compare SNPs, detected by using different mapping and SNP estimation parameters implemented in the bioinformatics tool CLC to SNPs that are detected by using the method of conventional Sanger sequencing. For my study I used lines of D. mauritiana. As reference genome in CLC I am bound to use the genome of D. simulans, because there is no available genome of D. mauritiana until now. The critical part is to map the raw data of GAII (reads) against a reference genome, because the itemized genome of D. simulans is not in high quality and the reads used in this study are short (~50pb). I do not expect reliable mapping. Using a stringent parameter set (short parameter set) that allows few mismatches of reads when mapping against the reference genome only few regions are recovered which show divergence to the reference. 60%-81% SNPs detected with this parameter set are false positive ones compared to Sanger sequencing. If a less stringent parameter set (long parameter set allowing more mismatches) is used this results in a very high number of false positive SNPs. Two times more SNPs were recovered with this parameter set, thereof 70-80% are false positive ones. Based on my results I drew conclude that the best approach for SNP detection is to make one first run permitting a high number of mismatches. This turned out to be necessary to get a more appropriate consensus, because I didn’t use a reference genome of the same species I sequenced before. The next step should be the use of more stringent values to reduce the high number of false positive SNPs. The final result is that reads from GAII with a length of ~50bp which I used for this study are too short to get accurate SNP estimations, regardless which parameter set and SNP detection parameters have been used.


© University of Veterinary Medicine ViennaHelp and DownloadsAccessibility statement