<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Andrew,<div class=""><br class=""></div><div class="">we updated our 1000 Genomes Project files since release 90.</div><div class=""><br class=""></div><div class="">For release/90 we used <a href="ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz" class="">ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz</a></div><div class=""><br class=""></div><div class="">In those files rs1406071 was known under an older variant ID rs566458560</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">We updated the files to <a href="ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17_GRCh38.genotypes.20170504.vcf.gz" class="">ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17_GRCh38.genotypes.20170504.vcf.gz</a></div><div class=""><br class=""></div><div class="">The new files correct problems that were present in the first version.</div><div class=""><br class=""></div><div class="">You can find a detailed report about the new files here: <a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/README_GRCh38_liftover_20170504.txt" class="">ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/README_GRCh38_liftover_20170504.txt</a></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">For release/92 we used <a href="ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17_GRCh38.genotypes.20170504.vcf.gz" class="">ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17_GRCh38.genotypes.20170504.vcf.gz</a></div><div class=""><br class=""></div><div class="">And in those files rs1406071 is used.</div><div class=""><br class=""></div><div class="">tabix <a href="ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17_GRCh38.genotypes.20170504.vcf.gz" class="">ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/ALL.chr17_GRCh38.genotypes.20170504.vcf.gz</a> 17:46148119-46148119</div><div class=""><br class=""></div><div class="">We are aware that mapping by ID only is not always reliable and added a mapping by location for release/93 (due end of June 2018) to our LD code.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">For human we only have genotype data from the 1000 Genomes Project. We added an evidence attribute to each variant in our database with 1000 Genomes Data. You can retrieve the attribute by calling <a href="http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1VariationFeature.html#a698bae1ff1f3999af3f204f820a4c8b7" class="">get_all_evidence_values</a> on a variation feature object and then filter for the attribute called 1000Genomes.</div><div class=""><br class=""></div><div class="">I don’t know how feasible it is for you to switch to the most recent API version. If you need to stay with release/90 code you could point to the new VCF files by updating you vcf_config file to</div><div class=""><a href="https://github.com/Ensembl/ensembl-variation/blob/release/92/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json" class="">https://github.com/Ensembl/ensembl-variation/blob/release/92/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json</a></div><div class=""><br class=""></div><div class="">Most of all you need to use the highlighted line in your vcf_config file <a href="https://github.com/Ensembl/ensembl-variation/blob/release/92/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json#L22" class="">https://github.com/Ensembl/ensembl-variation/blob/release/92/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json#L22</a></div><div class=""><br class=""></div><div class="">Please let me know if you have any further questions. Thank you also for the script you provided for debugging the problem.</div><div class=""><br class=""></div><div class="">Anja</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 2 Jun 2018, at 16:10, <a href="mailto:andrew126@mac.com" class="">andrew126@mac.com</a> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Hi,<br class=""><br class="">rs1406071 shows identical mapping information, 1KG variant-set participation, and 1KG allele frequencies between Ensembl API 90 and 92; however, API 90 returns no high-r2 LD results, while API 92 returns high-r2 LD results.  Script to reproduce and output shown below.  I'm not sure why API 90 returns no results?<br class=""><br class="">In general, when $ldFeatureContainerAdaptor->fetch_by_VariationFeature returns no r2 results, it either means the variation feature has no high-LD pairs or the variation feature is not in the LD dataset and so no r2 pairs can be calculated.  It seems very useful (and important) to distinguish those two possibilities when interpretting r2 results.<br class=""><br class="">Based on the API 90 result, pre-screening for query variant features which have 1KG frequencies and 1KG variant sets is insufficient for determining if a variation feature is capable of returning 1KG high-LD results.  Is there a method to answer "can this variant feature generate any LD results?" .. presumably it would also cover tri-allelics, which I don't think report LD results either?<br class=""><br class="">Thanks for any guidance.<br class=""><br class="">Best regards,<br class=""><br class="">Andrew<br class=""><br class="">API 90 output:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>The API version used is 90<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>variation-feature chromosomes:<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>        17<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>        CHR_HSCHR17_1_CTG5<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>        CHR_HSCHR17_2_CTG5<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>allele frequencies:<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>        Allele T has frequency: 0.759443339960239 in population 1000GENOMES:phase_3:EUR.<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>        Allele C has frequency: 0.240556660039761 in population 1000GENOMES:phase_3:EUR.<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>variant sets:<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>        1000 Genomes 3 - EUR<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>        1000 Genomes 3 - EUR - common<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>LD results >= r2 0.99:<br class=""><br class="">API 92 output:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>The API version used is 92<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>variation-feature chromosomes:<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>17<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>CHR_HSCHR17_1_CTG5<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>CHR_HSCHR17_2_CTG5<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>allele frequencies:<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>Allele C has frequency: 0.240556660039761 in population 1000GENOMES:phase_3:EUR.<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>Allele T has frequency: 0.759443339960239 in population 1000GENOMES:phase_3:EUR.<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>variant sets:<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>1000 Genomes 3 - EUR - common<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>1000 Genomes 3 - EUR<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>LD results >= r2 0.99:<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs74348235 1.000000<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs17660251 1.000000<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs2696707 1.000000<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs2696590 0.994581<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs2696703 1.000000<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs62061766 1.000000<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs2732596 0.994566<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>rs1406071 rs62061821 1.000000<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>Etc.<br class=""><br class="">Script:<br class=""><br class="">use strict;<br class="">$|=1;<br class="">use Bio::EnsEMBL::Registry;<br class="">use Bio::EnsEMBL::ApiVersion;<br class="">printf( "The API version used is %s\n", software_version() );<br class=""><br class="">my $registry = 'Bio::EnsEMBL::Registry';<br class="">$registry->load_all();<br class="">$registry->load_registry_from_db(<br class="">    -host => '<a href="http://ensembldb.ensembl.org" class="">ensembldb.ensembl.org</a>', # alternatively '<a href="http://useastdb.ensembl.org" class="">useastdb.ensembl.org</a>'<br class="">    -user => 'anonymous',<br class="">    );<br class=""><br class="">$registry->set_reconnect_when_lost(1);<br class="">my $v_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');<br class="">$v_adaptor->db->use_vcf(1);<br class="">my $pop_adaptor = $registry->get_adaptor("human","variation","population");<br class="">my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer');<br class="">$ldFeatureContainerAdaptor->max_snp_distance(100000);<br class="">my $population = $pop_adaptor->fetch_by_name("1000GENOMES:phase_3:EUR");<br class="">my $variation = $v_adaptor->fetch_by_name("rs1406071");<br class="">my $vfref = $variation->get_all_VariationFeatures();<br class="">my $query_variation_feature;<br class="">print "variation-feature chromosomes:\n";<br class="">foreach my $vf (@{$vfref}) {<br class="">    print "\t".$vf->seq_region_name."\n";<br class="">    if ($vf->seq_region_name eq "17") {<br class="">        $query_variation_feature = $vf;<br class="">    }<br class="">}<br class="">print "allele frequencies:\n";<br class="">my $alleles = $query_variation_feature->variation->get_all_Alleles();<br class="">foreach my $allele (@{$alleles}) {<br class="">    next unless (defined $allele->population);<br class="">    my $allele_string   = $allele->allele;<br class="">    my $frequency       = $allele->frequency || 'NA';<br class="">    my $population_name = $allele->population->name;<br class="">    if ($population_name=~/EUR/ && $population_name=~/1000/) {<br class="">        printf("\tAllele %s has frequency: %s in population %s.\n", $allele_string, $frequency, $population_name);<br class="">    }<br class="">}<br class="">my $variant_sets = $query_variation_feature->get_all_VariationSets();<br class="">print "variant sets:\n";<br class="">if (defined $variant_sets) {<br class="">    foreach my $vs (@{$variant_sets}){<br class="">        if ($vs->name()=~/EUR/) {<br class="">            print "\t".$vs->name()."\n";<br class="">        }<br class="">    }<br class="">}<br class="">print "LD results >= r2 0.99:\n";<br class="">my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_VariationFeature($query_variation_feature,$population);<br class="">foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){<br class="">    if ($r_square->{r2} >= 0.99){<br class="">        print "\t".$r_square->{variation1}->variation_name." ".$r_square->{variation2}->variation_name." ".$r_square->{r2}."\n";<br class="">    }<br class="">}<br class=""><br class=""><br class=""><br class=""><br class="">_______________________________________________<br class="">Dev mailing list    <a href="mailto:Dev@ensembl.org" class="">Dev@ensembl.org</a><br class="">Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" class="">http://lists.ensembl.org/mailman/listinfo/dev</a><br class="">Ensembl Blog: <a href="http://www.ensembl.info/" class="">http://www.ensembl.info/</a><br class=""></div></div></blockquote></div><br class=""></div></body></html>