<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix"><br>
      Thanks for the clarification will,<br>
      much appreciated!<br>
      Cheers<br>
      Seb<br>
      <br>
      <br>
      <br>
      On 06/02/13 20:36, Will McLaren wrote:<br>
    </div>
    <blockquote
cite="mid:CAMVEDX0Vpza7CoTMMxbuYNzqNf5z=ZPUkEgvpxGiBQntZi4cig@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hello Seb,
        <div><br>
        </div>
        <div style="">Apologies if this is a little confusing.</div>
        <div style=""><br>
        </div>
        <div style="">The --refseq flag only applies when you are using
          the VEP with a database (public or local). It asks the VEP to
          use the *_otherfeatures_* database instead of the *_core_*
          database; this contains RefSeq transcripts rather than ENST
          Ensembl transcripts.</div>
        <div style=""><br>
        </div>
        <div style="">We provide the <span
            style="font-size:13px;font-family:arial,sans-serif">homo_sapiens_refseq_vep_70.</span><span
            style="font-size:13px;font-family:arial,sans-serif">ta</span><span
            style="font-size:13px;font-family:arial,sans-serif">r.gz so
            that you can get these same annotations in --offline or
            --cache mode. To use this, you do not need to (and should
            not) use --refseq, you should merely point to the directory
            containing the unpacked cache using --dir.</span></div>
        <div style=""><span
            style="font-size:13px;font-family:arial,sans-serif"><br>
          </span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px">homo_sapiens_refseq_vep_70.</span><span
            style="font-family:arial,sans-serif;font-size:13px">ta</span><span
            style="font-family:arial,sans-serif;font-size:13px">r.gz has
            some limitations, one of which is that it does not contain
            HGNC identifiers for genes, hence why --hgnc does not work
            for you in this situation. Unfortunately these mappings
            between RefSeq and HGNC are somewhat hard to retrieve in
            Ensembl due to the way that our Xref (external reference)
            schema is put together. It is possible to retrieve these
            mappings from other resources, for example the UCSC MySQL
            server, and from there you could add this annotation to your
            VEP output.</span><span
            style="font-size:13px;font-family:arial,sans-serif"><br>
          </span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px"><br>
          </span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px">Hope
            this helps!</span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px"><br>
          </span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px">Regards</span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px"><br>
          </span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px">Will
            McLaren</span></div>
        <div style=""><span
            style="font-family:arial,sans-serif;font-size:13px">Ensembl
            Variation</span></div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On 6 February 2013 04:29, NextGenSeb <span
            dir="ltr"><<a moz-do-not-send="true"
              href="mailto:nextgenseb@gmail.com" target="_blank">nextgenseb@gmail.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
            <br>
            Dear all,<br>
            <br>
            I recently came across your variant effect predictor, so
            first of all<br>
            thanks for making that available, great tool!<br>
            I have a few questions however pertaining to the --refseq
            and --hgnc flags.<br>
            <br>
            First of all for the --refseq flag:<br>
            <br>
            After downloading both the  homo_sapiens_refseq_vep_70.tar.gz<br>
            homo_sapiens_vep_70.tar.gz caches from your ftp site and
            putting them<br>
            into /Data/VEP/homo_sapiens_refseq and
            /Data/VEP/homo_sapiens folders,<br>
            respectively, I tried the following two scenarios:<br>
            <br>
            perl<br>
            /usr/bioinf/source/variant_effect_predictor/<a
              moz-do-not-send="true"
              href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a><br>
            -i ./test.short.vcf --format vcf -o test.vep --verbose
            --force_overwrite<br>
            --refseq --cache --dir /Data/VEP/ --offline --everything
            --species<br>
            homo_sapiens<br>
            <br>
            perl<br>
            /usr/bioinf/source/variant_effect_predictor/<a
              moz-do-not-send="true"
              href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a><br>
            -i ./test.short.vcf --format vcf -o test.vep --verbose
            --force_overwrite<br>
            --refseq --cache --dir /Data/VEP/ --offline --everything
            --species<br>
            homo_sapiens_refseq<br>
            <br>
            Both commands ran without error, however neither
            incorporated the refseq<br>
            annotation in the output. Only when deleting both caches
            again and<br>
            unpacking homo_sapiens_refseq_vep_70.tar.gz into
            /Data/VEP/homo_sapiens<br>
            did the refseq NM_ tags get incorporated, but then even with
            --refseq<br>
            not set. This behavior did not change  when removing the
            --offline or<br>
            the --everything flag.<br>
            <br>
            In principle that would be fine, however the -hgnc flag
            appears to only<br>
            work with the homo_sapiens_vep_70.tar.gz data set and not
            the refseq<br>
            one, again regardless of whether run in online or offline
            mode.<br>
            <br>
            Now removing all cache files and running<br>
            <br>
            perl<br>
            /usr/bioinf/source/variant_effect_predictor/<a
              moz-do-not-send="true"
              href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a><br>
            -i ./test.short.vcf --format vcf -o test.vep --verbose
            --force_overwrite<br>
            --refseq --cache --dir /Data/VEP/ --write_cache --species
            homo_sapiens<br>
            <br>
            does actually give the correct output. However in this case
            adding the<br>
            --everything flag not only will put a huge strain on the
            server<br>
            connection when using bigger datasets then my current test
            one, but also<br>
            throws an exception (see end of email). Hence I would rather
            work of an<br>
            offline cache. Could you please advise me whether there is a
            way to fix<br>
            this issue?<br>
            <br>
            Thanks in advance for your help,<br>
            Cheers<br>
            Seb<br>
            <br>
            <br>
            <br>
            perl<br>
            /usr/bioinf/source/variant_effect_predictor/<a
              moz-do-not-send="true"
              href="http://variant_effect_predictor.pl" target="_blank">variant_effect_predictor.pl</a><br>
            -i ./test.short.vcf --format vcf -o test.vep --verbose
            --force_overwrite<br>
            --refseq --cache --dir /Data/VEP/test --write_cache
            --everything<br>
            --species homo_sapiens --host <a moz-do-not-send="true"
              href="http://useastdb.ensembl.org" target="_blank">useastdb.ensembl.org</a><br>
            <br>
            #----------------------------------#<br>
            # ENSEMBL VARIANT EFFECT PREDICTOR #<br>
            #----------------------------------#<br>
            <br>
            version 2.8<br>
            <br>
            By Will McLaren (<a moz-do-not-send="true"
              href="mailto:wm2@ebi.ac.uk" target="_blank">wm2@ebi.ac.uk</a>)<br>
            <br>
            Configuration options:<br>
            <br>
            cache              1<br>
            canonical          1<br>
            ccds               1<br>
            core_type          otherfeatures<br>
            dir                /Data/VEP/test<br>
            domains            1<br>
            everything         1<br>
            force_overwrite    1<br>
            format             vcf<br>
            gmaf               1<br>
            hgnc               1<br>
            hgvs               1<br>
            host               <a moz-do-not-send="true"
              href="http://useastdb.ensembl.org" target="_blank">useastdb.ensembl.org</a><br>
            input_file         ./test.short.vcf<br>
            numbers            1<br>
            output_file        test.vep<br>
            polyphen           b<br>
            port               5306<br>
            protein            1<br>
            refseq             1<br>
            regulatory         1<br>
            sift               b<br>
            species            homo_sapiens<br>
            toplevel_dir       /Data/VEP/test<br>
            verbose            1<br>
            write_cache        1<br>
            <br>
            --------------------<br>
            <br>
            <br>
            <br>
            Will only load v70 databases<br>
            Species 'homo_sapiens' loaded from database
            'homo_sapiens_core_70_37'<br>
            Species 'homo_sapiens' loaded from database
            'homo_sapiens_cdna_70_37'<br>
            Species 'homo_sapiens' loaded from database
            'homo_sapiens_vega_70_37'<br>
            Species 'homo_sapiens' loaded from database<br>
            'homo_sapiens_otherfeatures_70_37'<br>
            Species 'homo_sapiens' loaded from database
            'homo_sapiens_rnaseq_70_37'<br>
            homo_sapiens_variation_70_37 loaded<br>
            homo_sapiens_funcgen_70_37 loaded<br>
            Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the
            following<br>
            compara databases will be ignored: ensembl_compara_70<br>
            ensembl_ancestral_70 loaded<br>
            ensembl_ontology_70 loaded<br>
            ensembl_stable_ids_70 loaded<br>
            2013-02-06 11:21:35 - Connected to core version 70 database
            and<br>
            variation version 70 database<br>
            2013-02-06 11:21:35 - INFO: Cache directory<br>
            /Data/VEP/test/homo_sapiens/70 not found - it will be
            created<br>
            2013-02-06 11:21:40 - INFO: Database will be accessed when
            using --hgvs<br>
            2013-02-06 11:21:40 - INFO: Database will be accessed when
            using --sift;<br>
            consider using the complete cache containing sift data (see<br>
            documentation for details)<br>
            2013-02-06 11:21:40 - INFO: Database will be accessed when
            using<br>
            --polyphen; consider using the complete cache containing
            polyphen data<br>
            (see documentation for details)<br>
            2013-02-06 11:21:40 - INFO: Database will be accessed when
            using<br>
            --regulatory; consider using the complete cache containing
            regulatory<br>
            data (see documentation for details)<br>
            2013-02-06 11:21:40 - Starting...<br>
            2013-02-06 11:21:42 - Read 62 variants into buffer<br>
            2013-02-06 11:21:42 - Reading transcript data from cache
            and/or database<br>
            [======================================================================================================]<br>
            [ 100% ]<br>
            2013-02-06 12:01:12 - Retrieved 409 transcripts (0 mem, 0
            cached, 411<br>
            DB, 2 duplicates)<br>
            2013-02-06 12:01:12 - Reading regulatory data from cache
            and/or database<br>
            [======================================================================================================]<br>
            [ 100% ]<br>
            2013-02-06 12:05:08 - Retrieved 1738 regulatory features (0
            mem, 0<br>
            cached, 1738 DB, 0 duplicates)<br>
            2013-02-06 12:05:08 - Checking for existing variations<br>
            [======================================================================================================]<br>
            [ 100% ]<br>
            2013-02-06 12:05:33 - Analyzing chromosome 1<br>
            2013-02-06 12:05:33 - Analyzing variants<br>
            [======================================================================================================]<br>
            [ 100% ]<br>
            2013-02-06 12:05:33 - Analyzing RegulatoryFeatures<br>
            [======================================================================================================]<br>
            [ 100% ]<br>
            2013-02-06 12:05:33 - Analyzing MotifFeatures<br>
            [======================================================================================================]<br>
            [ 100% ]<br>
            2013-02-06 12:05:33 - Calculating consequences<br>
            [=======> ]    [ 9% ]<br>
            -------------------- EXCEPTION --------------------<br>
            MSG: Got to have an Exon object, not a<br>
            STACK Bio::EnsEMBL::Translation::start_Exon<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Translation.pm:271<br>
            STACK Bio::EnsEMBL::Transcript::transfer<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Transcript.pm:2511<br>
            STACK Bio::EnsEMBL::Variation::BaseTranscriptVariation::_three_prime_utr<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/BaseTranscriptVariation.pm:638<br>
            STACK<br>
            Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_alternate_cds<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:1169<br>
            STACK<br>
            Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_fs_peptides<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:1090<br>
            STACK<br>
            Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_hgvs_peptides<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:930<br>
            STACK Bio::EnsEMBL::Variation::TranscriptVariationAllele::hgvs_protein<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:705<br>
            STACK Bio::EnsEMBL::Variation::Utils::VEP::tva_to_line<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1639<br>
            STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1472<br>
            STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1275<br>
            STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences<br>
            /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1056<br>
            STACK main::main<br>
            /usr/bioinf/source/variant_effect_predictor/<a
              moz-do-not-send="true"
              href="http://variant_effect_predictor.pl:270"
              target="_blank">variant_effect_predictor.pl:270</a><br>
            STACK toplevel<br>
            /usr/bioinf/source/variant_effect_predictor/<a
              moz-do-not-send="true"
              href="http://variant_effect_predictor.pl:116"
              target="_blank">variant_effect_predictor.pl:116</a><br>
            Date (localtime)    = Wed Feb  6 12:05:33 2013<br>
            Ensembl API version = 70<br>
            ---------------------------------------------------<br>
            <br>
            <br>
            <br>
            <br>
            _______________________________________________<br>
            Dev mailing list    <a moz-do-not-send="true"
              href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>
            Posting guidelines and subscribe/unsubscribe info: <a
              moz-do-not-send="true"
              href="http://lists.ensembl.org/mailman/listinfo/dev"
              target="_blank">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
            Ensembl Blog: <a moz-do-not-send="true"
              href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Dev mailing list    <a class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>