<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Hi Paige,</div><div><br></div><div>I have now heard back from the developer of Condel (Abel Gonzalez Perez) and he confirms that the odd prediction you found is a shortcoming of the way in which the consensus score is currently calculated for mutations with low SIFT and PolyPhen scores (please see his email below for details - forwarded with his permission). We will work with Abel to try to address this issue for future ensembl releases.</div><div><br></div><div>Thanks for bringing this to our attention.</div><div><br></div><div>Best regards,</div><div><br></div><div>Graham </div><div><br></div><div>Ensembl variation</div><div><br></div><div><br><blockquote type="cite"><div bgcolor="#ffffff" text="#000000">

Hi Graham,<br>

<br>

I checked and you're both absolutely right. This case would be a condel

false positive. The reason is in the way we compute the denominator of

the weighted average score. If the scores of the methods are low, the

probability of the SNP being a false negative of each method is very

low, and so is their summation (the denominator of the weighted average

score). As a result, the condel score of low-scored SNPs are

artificially high. This effect is mitigated when the scores of more

methods are taken into account, because their scores tend to be

contradictory, and therefore, the result is more balanced. The bottom

line is we need to rethink the way we obtain the denominator of the

weighted average score for only two methods.<br>

<br>

One possibility to solve the problem is the following. Use a simple

counter as denominator in this case, instead of the probabilities that

are used to weight the scores. This way, the weighted summation would

be divided simply by the number of methods that contribute to the

integrated score, that is 1, or 2.<br>

<br>

I have just implemented a new version of condel_SP that computes the

denominator in this way, and tested it on HumVar. The area under the

ROC curve is very similar to that produced by the original condel_SP,

but it produces more accurate results for SNPs with low SIFT and

Polyphen scores. I attach the modified was_SP.pl script.<br>

<br>

So, to summarize, we have discovered that our calculation of the

weighted average score designed to be used with five methods, when

tested only with SIFT and Polyphen tends to overestimate the

deleteriousness of SNPs with low SIFT and Polyphen scores, because it

produces erroneously small denominators. To solve this we propose this

new version of the weighted average score that corrects this problem

for low-scored SNPs, possibly at the expense of misclassifying SNPs

whose scores are close to the cutoffs of the individual methods.<br>

<br>

Best regards,<br>

<br>

Abel<br>

<br>

<pre class="moz-signature" cols="72">-- 

Abel González Pérez, PhD

Bioinformatician

<a class="moz-txt-link-freetext" href="http://abeldavidgp.synthasite.com/">http://abeldavidgp.synthasite.com/</a>

<a class="moz-txt-link-freetext" href="http://publicationslist.org/abeldavidgp">http://publicationslist.org/abeldavidgp</a>

Research Unit on Biomedical Informatics - GRIB

<a class="moz-txt-link-freetext" href="http://bg.imim.es/">http://bg.imim.es/</a>

Parc de Recerca Biomčdica de Barcelona (PRBB)

c/ Dr. Aiguader, 88

E-08003 Barcelona</pre></div></blockquote></div><br></body></html>