<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 23, 2024 at 10:43 PM Allan Kamau <<a href="mailto:kamauallan@gmail.com">kamauallan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 23, 2024 at 6:37 PM Benjamin Moore <<a href="mailto:bmoore@ebi.ac.uk" target="_blank">bmoore@ebi.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Allan,<br>
<br>
I think the most straightforward way to retreieve the 5'UTR sequences <br>
for a list of Ensembl features (I assume you have a list of gene IDs, <br>
ENSG...) using the REST API is to use the Lookup endpoints with the <br>
expand and utr optional parameters to retreieve the genomic coordinates <br>
of the 5'UTRs of each transcript for your list of genes:<br>
<br>
<a href="https://rest.ensembl.org/documentation/info/lookup" rel="noreferrer" target="_blank">https://rest.ensembl.org/documentation/info/lookup</a><br>
<br>
Then, you can use the coordinates from the first step as the input for <br>
the Sequence/region endpoints to retreieve the genomic sequence of the <br>
5' UTRs:<br>
<br>
<a href="https://rest.ensembl.org/documentation/info/sequence_region" rel="noreferrer" target="_blank">https://rest.ensembl.org/documentation/info/sequence_region</a><br>
<br>
I hope this helps.<br>
<br>
Best wishes<br>
<br>
Ben<br>
<br>
On 23/04/2024 15:30, Allan Kamau wrote:<br>
> Is there a way to obtain 5'UTR sequences given a list of ensembl ids <br>
> programmatically?<br>
><br>
> I have a list of ensembl ids for which I would like to obtain the <br>
> 5'UTR region for each one of them programmatically hopefully via <br>
> ensembl rest using wget, or python (ensembl-rest).<br>
><br>
> Kindly assist.<br>
><br>
> Thanks.<br>
><br>
> -Allan.<br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
> Posting guidelines and subscribe/unsubscribe info: <a href="https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org" rel="noreferrer" target="_blank">https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org</a><br>
> Ensembl Blog: <a href="http://www.ensembl.info/" rel="noreferrer" target="_blank">http://www.ensembl.info/</a><br>
<br>
-- <br>
Dr. Ben Moore (he/him)<br>
Ensembl Outreach Manager<br>
<br>
European Bioinformatics Institute (EMBL-EBI)<br>
European Molecular Biology Laboratory<br>
Wellcome Trust Genome Campus<br>
Hinxton<br>
Cambridge<br>
CB10 1SD<br>
UK<br>
<br>
<a href="mailto:bmoore@ebi.ac.uk" target="_blank">bmoore@ebi.ac.uk</a><br>
+44 (0)1223 494265<br>
<br></blockquote><div><br></div><div>Thank you Ben for your response. I am now stuck in defining the url for the <a href="https://rest.ensembl.org/sequence/region/" target="_blank">https://rest.ensembl.org/sequence/region</a> resource.</div><div><br></div><div>I am using the ensembl id "ENSMUSG00000041075" in this example.</div><div>The URL below provides the sequence features the ensembl id object "ENSMUSG00000041075".</div><div><br></div><div><a href="https://rest.ensembl.org/lookup/id/ENSMUSG00000041075?content-type=application/json;expand=1;utr=1" target="_blank">https://rest.ensembl.org/lookup/id/ENSMUSG00000041075?content-type=application/json;expand=1;utr=1</a></div><div><br></div><div>This returns the query below</div><div><br>{                                                                                                                                                                                                                                                                                                                                                                                                                                  <br>    "ENSMUSG00000041075": {                                                                                                                                                                                                                                                                                                                                                                                                        <br>        "seq_region_name": "1",                                                                                                                                                                                                                                                                                                                                                                                                    <br>        "logic_name": "ensembl_havana_gene_mus_musculus",                                                                                                                                                                                                                                                                                                                                                                          <br>        "end": 59526114,                                                                                                                                                                                                                                                                                                                                                                                                           <br>        "biotype": "protein_coding",                                                                                                                                                                                                                                                                                                                                                                                               <br>        "version": 9,                                                                                                                                                                                                                                                                                                                                                                                                              <br>        "db_type": "core",                                                                                                                                                                                                                                                                                                                                                                                                         <br>        "object_type": "Gene",                                                                                                                                                                                                                                                                                                                                                                                                     <br>        "strand": 1,                                                                                                                                                                                                                                                                                                                                                                                                               <br>        "start": 59521583,                                                                                                                                                                                                                                                                                                                                                                                                         <br>        "canonical_transcript": "ENSMUST00000114246.4",                                                                                                                                                                                                                                                                                                                                                                            <br>        "Transcript": [                                                                                                                                                                                                                                                                                                                                                                                                            <br>            {                                                                                                                                                                                                                                                                                                                                                                                                                      <br>                "biotype": "protein_coding",<br>                "version": 4,<br>                "db_type": "core",<br>                "object_type": "Transcript",<br>                "seq_region_name": "1",<br>                "logic_name": "ensembl_havana_transcript_mus_musculus",<br>                "end": 59526114,<br>                "Translation": {<br>                    "id": "ENSMUSP00000109884",<br>                    "length": 572,<br>                    "start": 59522119,<br>                    "end": 59523837,<br>                    "version": 3,<br>                    "object_type": "Translation",<br>                    "db_type": "core",<br>                    "Parent": "ENSMUST00000114246",<br>                    "species": "mus_musculus"<br>                },<br>                "assembly_name": "GRCm39",<br>                "Parent": "ENSMUSG00000041075",<br>                "is_canonical": 1,<br>                "display_name": "Fzd7-201",<br>                "Exon": [<br>                    {<br>                        "species": "mus_musculus",<br>                        "version": 4,<br>                        "db_type": "core",<br>                        "object_type": "Exon",<br>                        "assembly_name": "GRCm39",<br>                        "id": "ENSMUSE00000698652",<br>                        "start": 59521583,<br>                        "end": 59526114,<br>                        "strand": 1,<br>                        "seq_region_name": "1"<br>                    }<br>                ],<br>                "UTR": [<br>                    {<br>                        "object_type": "five_prime_UTR",<br>                        "db_type": "core",<br>                        "assembly_name": "GRCm39",<br>                        "Parent": "ENSMUST00000114246",<br>                        "type": "five_prime_utr",<br>                        "species": "mus_musculus",<br>                        "seq_region_name": "1",<br>                        "strand": 1,<br>                        "id": "ENSMUST00000114246",<br>                        "source": "ensembl_havana",<br>                        "start": 59521583,<br>                        "end": 59522118<br>                    },<br>                    {<br>                        "type": "three_prime_utr",<br>                        "species": "mus_musculus",<br>                        "db_type": "core",<br>                        "object_type": "three_prime_UTR",<br>                        "assembly_name": "GRCm39",<br>                        "Parent": "ENSMUST00000114246",<br>                        "id": "ENSMUST00000114246",<br>                        "source": "ensembl_havana",<br>                        "end": 59526114,<br>                        "start": 59523838,<br>                        "seq_region_name": "1",<br>                        "strand": 1<br>                    }<br>                ],<br>                "species": "mus_musculus",<br>                "strand": 1,<br>                "start": 59521583,<br>                "id": "ENSMUST00000114246",<br>                "source": "ensembl_havana",<br>                "length": 4532<br>            }<br>        ],<br>        "id": "ENSMUSG00000041075",<br>        "description": "frizzled class receptor 7 [Source:MGI Symbol;Acc:MGI:108570]",<br>        "source": "ensembl_havana",<br>        "assembly_name": "GRCm39",<br>        "display_name": "Fzd7",<br>        "species": "mus_musculus"<br>    }<br>}<br></div><div><br></div><div><div>What would be formulation for the "<a href="https://rest.ensembl.org/sequence/region/" target="_blank">https://rest.ensembl.org/sequence/region/</a>" for the 5'UTR gene region given above.</div><div><br></div><div>Below is the step where I am stuck.</div><div><a href="https://rest.ensembl.org/sequence/region/mus_musculus/" target="_blank">https://rest.ensembl.org/sequence/region/mus_musculus/</a><what_goes_here>:59522119..59523837:1?coord_system=seqlevel;content-type=text/x-fasta</div></div><div><br></div><div>-Allan.</div><div> </div></div></div></blockquote><div><br></div><div>What would be the url to obtain the five_prime_UTR region?</div><div><br></div><div>I have tried the URL below but it finds no slice.</div><div><a href="https://rest.ensembl.org/sequence/region/mus_musculus/GRCm39:59521583..59522118:1?coord_system=seqlevel;content-type=text/x-fasta">https://rest.ensembl.org/sequence/region/mus_musculus/GRCm39:59521583..59522118:1?coord_system=seqlevel;content-type=text/x-fasta</a> </div><div><br></div><div>-Allan.</div></div></div>