<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<meta name="Generator" content="Microsoft Word 15 (filtered medium)">


<style><!--


/* Font Definitions */


@font-face


        {font-family:"Cambria Math";


        panose-1:2 4 5 3 5 4 6 3 2 4;}


@font-face


        {font-family:Calibri;


        panose-1:2 15 5 2 2 2 4 3 2 4;}


/* Style Definitions */


p.MsoNormal, li.MsoNormal, div.MsoNormal


        {margin:0cm;


        font-size:11.0pt;


        font-family:"Calibri",sans-serif;


        mso-fareast-language:EN-US;}


a:link, span.MsoHyperlink


        {mso-style-priority:99;


        color:#0563C1;


        text-decoration:underline;}


span.EmailStyle17


        {mso-style-type:personal-compose;


        font-family:"Calibri",sans-serif;


        color:windowtext;}


.MsoChpDefault


        {mso-style-type:export-only;


        mso-fareast-language:EN-US;}


@page WordSection1


        {size:612.0pt 792.0pt;


        margin:72.0pt 72.0pt 72.0pt 72.0pt;}


div.WordSection1


        {page:WordSection1;}


--></style>


</head>


<body lang="EN-GB" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">


<div class="WordSection1">


<p class="MsoNormal"><span lang="EN-US">Hello,<o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


<p class="MsoNormal"><span lang="EN-US">I am downloading various ENSEMBL data in order to build a new custom database for a product called


<a href="http://www.geneweaver.org/">geneweaver</a> <o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


<p class="MsoNormal"><span lang="EN-US">I can ingest gvf/gtf reasonably quickly using a


<a href="https://mvnrepository.com/artifact/org.geneweaver/gweaver-stream-io">library</a>. It averages around 0.01ms / Variant or Gene-like object over the full data.<o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


<p class="MsoNormal"><span lang="EN-US">In addition to this, the database adds other species and links their orthologues to human. To find orthologs I hit the web service at


<a href="http://rest.ensembl.org">http://rest.ensembl.org</a> using "/homology/id/{geneId}" API. Using this lookup is slow (admittedly it is only for genes and not all variants). My times are coming out 3-4s per node.<o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


<p class="MsoNormal"><span lang="EN-US">That makes finding orthologues a considerably time consuming process, it takes longer than adding all 700mill odd human variants.<o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


<p class="MsoNormal"><span lang="EN-US">My question is if it is possible to do it faster? For instance can I download the orthologue data as a file or database? Or the data in which it is held and parse it myself? Can I bulk export somehow? Perhaps more than


 one orthologue at a time? (If this is not the correct email list, what is? </span>


<span lang="EN-US" style="font-family:"Apple Color Emoji"">😊</span><span lang="EN-US">)<o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


<p class="MsoNormal"><span lang="EN-US">Thanks,<o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


<p class="MsoNormal"><span lang="EN-US">Matt Gerring<o:p></o:p></span></p>


<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>


</div>


---<br><br>The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.</body>


</html>