<div dir="ltr">Hi Joao,<div><br></div><div>The transcripts objects stored in the cache get "stripped" after being loaded from the database and before being serialized to disk. This has two functions: firstly to reduce the size of the cache by removing unnecessary or redundant information, and secondly to prevent the inclusion of components that cannot be serialized (such as database connections/adaptors).</div><div><br></div><div>This external transcript name was not used by the VEP code, so it falls in the unnecessary category, hence it is stripped out. You can see which fields get stripped in the clean_transcript() subroutine of VEP.pm:</div><div><br></div><div><a href="https://github.com/Ensembl/ensembl-variation/blob/master/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm#L4907-L4936">https://github.com/Ensembl/ensembl-variation/blob/master/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm#L4907-L4936</a><br></div><div><br></div><div>Regards</div><div><br></div><div>Will McLaren<br>Ensembl Variation</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 October 2016 at 16:06, João Eiras <span dir="ltr"><<a href="mailto:joao.eiras@gmail.com" target="_blank">joao.eiras@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi.<br>
<br>
I see the common transcript name is not included in the cache, and as<br>
such Bio::EnsEMBL::Transcript-><wbr>external_name returns nothing.<br>
<br>
For instance, transcript ENSMUST00000036136 has the name Colec-201.<br>
However only its gene has the name assigned ("Colec").<br>
<br>
This variant on genome GRCm38 happens inside that transcript:<br>
#CHROM POS REF ALT QUAL ID INFO FORMAT<br>
chr12 28612863 C T 999 . . .<br>
<br>
Or if you want to poke the cache<br>
$ gzip -dc <path to vep<br>
cache>/mus_musculus_merged/85_<wbr>GRCm38/12/28000001-29000000.gz | \<br>
  perl -e 'use Storable qw(fd_retrieve);use Data::Dumper;print Dumper(fd_ret<br>
rieve(STDIN));' | \<br>
  grep -a -C 6 -e ENSMUST00000036136 -e Colec<br>
<br>
Is there a reason why the name is not included, since that info is<br>
available in the gtf files ? [2]<br>
<br>
Thank you.<br>
<br>
[1] <a href="http://www.ensembl.org/Mus_musculus/Transcript/Summary?db=core;g=ENSMUSG00000036655;r=12:28594173-28623290;t=ENSMUST00000036136" rel="noreferrer" target="_blank">http://www.ensembl.org/Mus_<wbr>musculus/Transcript/Summary?<wbr>db=core;g=ENSMUSG00000036655;<wbr>r=12:28594173-28623290;t=<wbr>ENSMUST00000036136</a><br>
<br>
[2] <a href="ftp://ftp.ensembl.org/pub/release-85/gtf/mus_musculus/Mus_musculus.GRCm38.85.gtf.gz" rel="noreferrer" target="_blank">ftp://ftp.ensembl.org/pub/<wbr>release-85/gtf/mus_musculus/<wbr>Mus_musculus.GRCm38.85.gtf.gz</a><br>
line 1323644<br>
<br>
______________________________<wbr>_________________<br>
Dev mailing list    <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev" rel="noreferrer" target="_blank">http://lists.ensembl.org/<wbr>mailman/listinfo/dev</a><br>
Ensembl Blog: <a href="http://www.ensembl.info/" rel="noreferrer" target="_blank">http://www.ensembl.info/</a><br>
</blockquote></div><br></div>