<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hello again!
<div class=""><br class="">
</div>
<div class="">Thank you for the bug fix. My apologies, I read this old correspondence and mistakenly thought you still calculated LD on the fly in stead of having it in the database: <a href="https://www.biostars.org/p/109785/#147784" class="">https://www.biostars.org/p/109785/#147784</a></div>
<div class="">
<div class=""><br class="">
</div>
<div class="">I use all LD populations for now, as you suggested. However, a couple more questions appeared:</div>
<div class=""><br class="">
</div>
<div class="">1) When I call my $ldfc = $ldfc_adaptor->fetch_by_VariationFeature($vf, $ld_population);, lots of vcf files are downloaded. The terminal prints out «[get_local_version] downloading the index file…» when the files are downloaded, and the following
 files appear in my folder:</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr10.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr11.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr12.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr14.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr15.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr16.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr19.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr2.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr3.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr4.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr5.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr8.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
<div class="">ALL.chr9.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi</div>
</div>
<div class=""><br class="">
</div>
<div class="">What are these files? Why are they downloaded? Will release 84 have a different way of doing it?</div>
<div class=""><br class="">
</div>
<div class="">2) When I try to use a large file of SNPs (example attached in this email), I get the following message after all the 14 vcf-files posted above have been downloaded:</div>
<div class="">
<div class=""><font face="Courier New" class="">[kftp_connect_file]</font></div>
<div class=""><font face="Courier New" class="">[main] fail to open the data file.</font></div>
<div class=""><font face="Courier New" class="">[get_local_version] downloading the index file...</font></div>
<div class=""><font face="Courier New" class="">[kftp_connect_file] 530 You may not have more than 15 concurrent connections at any time.</font></div>
<div class=""><font face="Courier New" class="">[download_from_remote] fail to open remote file.</font></div>
</div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class="">If it tries to download another file at the same time the restriction threshold is met, I get this error message:</div>
<div class=""><font face="Courier New" class="">
<div class="">[get_local_version] downloading the index file...</div>
<div class="">[kftp_connect_file] 530 You may not have more than 15 concurrent connections at any time.</div>
<div class="">[download_from_remote] fail to open remote file.</div>
<div class="">[tabix] failed to load the index file.</div>
<div class="">[get_local_version] downloading the index file...</div>
<div class="">Can't use an undefined value as an ARRAY reference at /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm line 570, <> line 22.</div>
</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class="">Does  this mean I cannot connect to all chromosomes at the same time? Does LD only get computed between variants on the same chromosome? What is the reason behind this restriction? I am not too familiar with reasoning behind choices of structure
 etc in LD data sets, so these questions are based on my impression that variants can be in LD even though they are located on different chromosomes. If you have a different opinion on this, I would be happy to hear it!</div>
<div class=""><br class="">
</div>
<div class="">My script is the following (basically a copy of your previous suggestions :) )</div>
<div class="">
<div class=""><font face="Courier New" class="">use strict;</font></div>
<div class=""><font face="Courier New" class="">use warnings;</font></div>
<div class=""><font face="Courier New" class="">use Bio::EnsEMBL::Registry;</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class="">my $start_run = time();</font></div>
<div class=""><font face="Courier New" class="">open (OUT, ">expandedSNPS.txt");</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class="">my $registry = 'Bio::EnsEMBL::Registry';</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class="">$registry->load_registry_from_db(</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>-host => '<a href="http://ensembldb.ensembl.org" class="">ensembldb.ensembl.org</a>',</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>-user => 'anonymous'</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>);</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class=""># Connect to the databases:</font></div>
<div class=""><font face="Courier New" class="">my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');</font></div>
<div class=""><font face="Courier New" class="">my $ldfc_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'ldfeaturecontainer');</font></div>
<div class=""><font face="Courier New" class="">my $pop_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'population');</font></div>
<div class=""><font face="Courier New" class="">$variation_adaptor->db->use_vcf(1);</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class="">my $ld_population = $pop_adaptor->fetch_by_name('1000GENOMES:phase_3:CEU');</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class=""># Loop through all SNPs available and find SNPs in LD</font></div>
<div class=""><font face="Courier New" class="">while(<>) {</font></div>
<div class=""><span style="font-family: 'Courier New';" class=""><span class="Apple-tab-span" style="white-space:pre"></span>chomp;</span></div>
<div class=""><span class="Apple-tab-span" style="font-family: 'Courier New'; white-space: pre;"></span><span style="font-family: 'Courier New';" class="">my $variation_name = $_;</span></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>my $variation = $variation_adaptor->fetch_by_name($variation_name);</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>my @var_features = @{ $variation->get_all_VariationFeatures() };</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>foreach my $vf (@var_features) {</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>my $rsid = $vf->name;</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>my $start = $vf->start;</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>my $region = $vf->seq_region_name;</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>print OUT "$rsid\t$start\t$region\n";</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>print $ld_population->name, "\n";</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>my $ldfc = $ldfc_adaptor->fetch_by_VariationFeature($vf, $ld_population);</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>my @ld_values = @{ $ldfc->get_all_ld_values() };</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class="">      <span class="Apple-tab-span" style="white-space:pre">
</span>foreach my $ld_hash (@ld_values) {</font></div>
<div class=""><font face="Courier New" class=""><span class="Apple-tab-span" style="white-space:pre"></span>       
<span class="Apple-tab-span" style="white-space:pre"></span>my $r2 = $ld_hash->{r2};</font></div>
<div class=""><font face="Courier New" class="">        <span class="Apple-tab-span" style="white-space:pre">
</span><span class="Apple-tab-span" style="white-space:pre"></span>my $variation_name1 = $ld_hash->{variation1}->variation_name;</font></div>
<div class=""><font face="Courier New" class="">        <span class="Apple-tab-span" style="white-space:pre">
</span><span class="Apple-tab-span" style="white-space:pre"></span>my $variation_name2 = $ld_hash->{variation2}->variation_name;</font></div>
<div class=""><font face="Courier New" class="">        <span class="Apple-tab-span" style="white-space:pre">
</span><span class="Apple-tab-span" style="white-space:pre"></span>print OUT "\t$variation_name1\t$variation_name2\tr2=$r2\n";</font></div>
<div class=""><font face="Courier New" class="">      <span class="Apple-tab-span" style="white-space:pre">
</span>}</font></div>
<div class=""><font face="Courier New" class="">    <span class="Apple-tab-span" style="white-space:pre">
</span>}</font></div>
<div class=""><font face="Courier New" class="">}</font></div>
<div class=""><font face="Courier New" class=""><br class="">
</font></div>
<div class=""><font face="Courier New" class="">close OUT;</font></div>
<div class=""><font face="Courier New" class="">my $end_run = time();</font></div>
<div class=""><font face="Courier New" class="">my $run_time = $end_run - $start_run;</font></div>
<div class=""><font face="Courier New" class="">print "Job took $run_time seconds\n";</font></div>
</div>
<div class=""><br class="">
</div>
<div class=""></div>
</div>
</body>
</html>