<div dir="ltr"><div>In short, is there a way to download the 5' UTR and the CDS sequences of the mouse genome?</div><div><br></div>Any update will be appreciated.<div><br></div><div>-Allan.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 10, 2024 at 4:03 PM Allan Kamau <<a href="mailto:kamauallan@gmail.com">kamauallan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I would like to obtain the sequences for the 5' UTR and CDS for the mouse genome.<div>I began by filtering all the records having "five_prime_UTR" from the chromosome.<chromosome_name>.gff3.gz files from "<a href="https://ftp.ensembl.org/pub/release-112/gff3/mus_musculus/" target="_blank">https://ftp.ensembl.org/pub/release-112/gff3/mus_musculus/</a>", I obtain some 95358 records, it seems this number is too high as mouse genome has approximately 25,000 genes.</div><div><br></div><div>I did the similar filtering for records having the value "CDS" as their third field and obtained some 522159 entries, which is a large number considered there are only 25,000 genes for the GRCm39 genome.</div><div><br></div><div>What would be preferred way to obtain the 5' UTR and CDS for the entire mouse genome?</div><div><br></div><div>Regards,</div><div>-Allan.</div><div><br></div><div><br></div></div>
</blockquote></div>