Authors:
Sunhee Kim
1
;
Young-Suk Lee
2
and
Chang-Yong Lee
1
Affiliations:
1
The Department of Industrial and Systems Engineering, Kongju National University, Cheonan 330-717, South Korea
;
2
Center for RNA Research, Institute for Basic Science, Seoul 151-742, South Korea
Keyword(s):
Base Quality Score Recalibration, Single Nucleotide Polymorphism, Variant Calling, Database.
Abstract:
The base quality score recalibration (BQSR) is an important step in the variant calling from high-throughput sequence data. Motivated by the fact that BQSR necessarily requires a database of known variants such as the dbSNP, we present an extensive analysis on BQSR results for human and rice genome. We showed that the recalibration results depended on the size of the database: the more variants are there in the database, the larger averaged value of the recalibrated base quality scores is obtained. This implies that the recalibrated quality score is lower than it should be when the number of variants in the database is not large enough. Based on the finding that the size of the database should play a crucial role in BQSR, we proposed a method to create a database when the size of a database is not large enough for BQSR results to be reliable. We demonstrated that, in the case of human, the database constructed by the proposed method generated almost the same results as the human dbSN
P. In the case of rice, however, we showed that the proposed database is more reasonable than the rice dbSNP.
(More)