<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Marc,<br><div><br><div><div>On 29 Aug 2013, at 08:09, Marc Hoeppner <<a href="mailto:mphoeppner@gmail.com">mphoeppner@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Thibaut et al, <br>
<br>
sorry to bother you guys again - last question(s), I promise. <br>
<br>
1) Using WU-Blast<br>
<br>
I have been trying to switch to wublast to remove a bit of
overhead from setting up the pipeline with NCBI. However, when
following the documentation, I get this as a command line call:<br>
<br>
wublastp /references/databases/uniprot_swissprot/uniprot.fasta
/tmp/seq.9741.42380.fa -gi -cpus=1<br>
<br>
The thing is that my version of Wu-blast (BLASTP 2.0a19MP-WashU
[14-Jul-1998] [Build linux-x86 18:51:39 30-Jul-1998]) does not
have a '-cpus' flag and I didn't set it. Hence, it fails.<br>
<br>
The flag is set within within the Blast module(s) automatically,
like in Analysis::Runnable::Blast.pm:<br>
<br>
$self->options('-cpus=1') if(!$self->options);<br></div></div></blockquote>This is probably a bug, You can simply delete/comment this line. I will have a look.<br><blockquote type="cite"><div bgcolor="#FFFFFF" text="#000000"><div class="moz-cite-prefix">
<br>
So I am guessing you guys must be using some other version of
Wu-Blast? I can't find any other than the one I got, since the
original code from <a class="moz-txt-link-freetext" href="http://blast.wustl.edu/">http://blast.wustl.edu/</a> is now hosted elsewhere
and the only legacy version of their blast suite (Wu-blast) is the
one I downloaded. <br>
<br>
Any idea where I can get the version used by EnsEMBL? Or is it the
same and the error lies elsewhere?<br>
<br>
2) Having to specify '-p blastn' as a parameter for DNA-Peptide
blasts when running the NCBI suite.<br>
(This is from a conversation with Thibaut directly - the pipeline
fails to determine the right blastall program to use when running
DNA/Peptide blasts)<br></div></div></blockquote>I don't see it as problem, when you prepare your analysis you just need to specify it in the config file.</div><div><br></div><div><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">[unigene] </span><br style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">db=unigene </span><br style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">db_file=/data2/projects/annotation/EnsEMBL/chicken/refseqs/unigene.fa </span><br style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">program=blastall </span><br style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">program_file=blastall </span><br style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">parameters=-p blastn -A 40 -a 1</span><br style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">module=BlastGenscanDNA </span><br style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">input_id_type=CONTIG </span></div><div><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); "><br></span></div><div><span style="font-family: -moz-fixed; font-size: 12px; background-color: rgb(255, 255, 255); ">The line parameters contains program specific parameters. I'm not sure of the -A 40 but it seems to me to be similar to -hitdist 40</span></div><div><font face="-moz-fixed"><br></font></div><div><font face="-moz-fixed">I suppose that we were using NCBI blast for the protein alignment and WU blast for the dna alignment but I cannot say for sure.</font></div><div><font face="-moz-fixed"><br></font></div><div><font face="-moz-fixed">You can modify our code to suit your needs.</font></div><div><font face="-moz-fixed"><br></font></div><div><font face="-moz-fixed">I will look into it later.</font></div><div><font face="-moz-fixed"><br></font></div><div><font face="-moz-fixed">Regards</font></div><div><font face="-moz-fixed">Thibaut</font></div><div><font face="-moz-fixed"><br></font><blockquote type="cite"><div bgcolor="#FFFFFF" text="#000000"><div class="moz-cite-prefix">
<br>
So I had a look at the code, since I still didn't understand why
BlastGenscanPep.pm would automatically figure out which blastall
programm (blastall -p blastp ) to use and BlastGenscanDna.pm
wouldn't . Seems like this descision process is coded into the
BlastGenscanPep.pm module, but was left out of BlastGenscanDna.pm.
I wonder why that is? Makes using NCBI for the pipeline a little
awkward (which I wouldn't mind if I could get the stupid Wu-Blast
to work ;))<br>
<br>
From BlastGenscanPep:<br>
<br>
if ( $blast{-type}=~m/ncbi/ ) {<br>
if ( $options{-query_type}=~m/pep/ ) {<br>
if ( $options{-database_type}=~m/pep/ ) {<br>
$options_string = '-p blastp' ; <br>
} elsif ( $options{-database_type}=~m/dna/ ) {<br>
$options_string = '-p tblastn' ;<br>
}<br>
}<br>
<br>
if ( $options{-query_type}=~m/dna/ ) {<br>
if ( $options{-database_type}=~m/dna/ ) {<br>
$options_string = '-p blastn' ;<br>
}elsif ( $options{-database_type}=~m/pep/ ) {<br>
$options_string = '-p blastx' ;<br>
}<br>
}<br>
}<br>
<br>
<br>
<br>
Cheers,<br>
<br>
Marc<br>
</div>
<blockquote cite="mid:2007B28F-7CC5-4B62-BAAC-F8887EBB7EBC@sanger.ac.uk" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
Hi Marc,
<div><br>
<div>
<div>On 28 Aug 2013, at 14:35, Marc Hoeppner <<a moz-do-not-send="true" href="mailto:mphoeppner@gmail.com">mphoeppner@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Thibaut,<br>
<br>
thanks for the help, been able to fix my problems. <br>
<br>
Turns out that yes, the coordinate system rank seems to
matter - but I think the problem was that Pmatch was
configured to write to the GENEWISE_DB, which existed
and had tables and everything, but wasn't loaded with
any data (such as a coordinate system). <br>
</div>
</div>
</blockquote>
Whenever you write in a database using the Genebuild pipeline
you need to check that some tables are in sync with your
reference. Here are the tables:</div>
<div>analysis</div>
<div>assembly</div>
<div>assembly_exception</div>
<div>coord_system</div>
<div>seq_region</div>
<div>seq_region_attrib</div>
<div>meta</div>
<div>attrib_type</div>
<div>seq_region_synonym</div>
<div>
<div><br>
</div>
</div>
<div><br>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"> <br>
For the unigene blast, adding the blastall program
option manually into the parameter string solved that
problem, but I still wonder if there is a bug in the
code (it should do this automatically).<br>
</div>
</div>
</blockquote>
As it is an blastall option and can be set in the analysis
configuration I don't see it as a bug</div>
<div><br>
</div>
<div>Regards</div>
<div>Thibaut</div>
<div> <br>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"> <br>
Anyway, thanks again!<br>
<br>
/Marc<br>
</div>
<blockquote cite="mid:30269147-DC7E-41DE-8828-14B070152902@sanger.ac.uk" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
Hi Marc,
<div><br>
<div>
<div>On 27 Aug 2013, at 13:53, Marc Hoeppner <<a moz-do-not-send="true" href="mailto:mphoeppner@gmail.com">mphoeppner@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Thibaut,<br>
<br>
thanks for the feedback. Answers to your
comments in line:<br>
<br>
<br>
</div>
<blockquote cite="mid:8721613D-12DB-4AA8-8B36-141F116838B6@sanger.ac.uk" type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Hi Marc,
<div><br>
<div>
<div>On 26 Aug 2013, at 10:59, Marc
Hoeppner <<a moz-do-not-send="true" href="mailto:mphoeppner@gmail.com">mphoeppner@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-western">Hi
EnsEMBL team, <br>
<br>
been playing with the pipeline
again, but am having problems
(again). Please see below for
details - am happy about any
suggestions. <br>
<br>
Cheers, <br>
<br>
Marc <br>
<br>
######## <br>
1) Pmatch <br>
######## <br>
<br>
I set up a pmatch analysis as by the
documentation and it runs fine on my
test dataset (small chicken
chromosome) when I try it with
test_RunnableDB. However, when I run
the pipeline, I get this: <br>
<br>
TARGET 0.064u 0.008s 0+0k 0pf 0sw <br>
BUILD 0.116u 0.040s 0+0k 0pf 0sw <br>
SEARCH 22.949u 0.172s 0+0k 0pf 0sw
<br>
WARN: For multiple species use
species attribute in
DBAdaptor->new() <br>
WRITING: Lost the will to live Error
<br>
Job 1198 failed: [ <br>
-------------------- EXCEPTION
-------------------- <br>
MSG: Problems for Pmatch writing
output for
chromosome:vchicken_test:10:1:19911089:1
[Can't call method "version" on an
undefined value at
/opt/bioinformatics/ensembl-70/ensembl/modules/Bio/EnsEMBL/DBSQL/MetaContainer.pm
line 218. <br>
] <br>
STACK
Bio::EnsEMBL::Pipeline::Job::run_module
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Job.pm:720<br>
STACK (eval)
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/runner.pl:219<br>
STACK main::run_jobs_with_lsfcopy
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/runner.pl:218<br>
STACK toplevel
/opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/runner.pl:128<br>
Date (localtime) = Fri Aug 23
14:53:27 2013 <br>
Ensembl API version = 70 <br>
<br>
</div>
</div>
</blockquote>
We would need to see how your coord_system
and meta tables are populated.</div>
<div>The API complains that it can't find
the version of your assembly. Your
coord_system table should look like this
one:</div>
<div>
<div>+-----------------------+----------------+------------------+------------+-------+--------------------------------+</div>
<div>| coord_system_id | species_id | name
| version | rank |
attrib |</div>
<div>+-----------------------+----------------+------------------+------------+-------+--------------------------------+</div>
<div>| 1 |
1 | contig |
NULL | 3 |
default_version,sequence_level |</div>
<div>| 2 |
1 | scaffold |
oryCun2 | 2 | default_version
|</div>
<div>| 3 |
1 | chromosome | oryCun2 |
1 | default_version
|</div>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-western">
<br>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
This is my coord_system table:<br>
<br>
+-----------------+------------+------------+---------------+------+--------------------------------+<br>
| coord_system_id | species_id | name |
version | rank |
attrib |<br>
+-----------------+------------+------------+---------------+------+--------------------------------+<br>
| 1 | 1 | chromosome |
vchicken_test | 1 |
default_version |<br>
| 2 | 1 | contig |
vchicken_test | 3 |
default_version,sequence_level |<br>
+-----------------+------------+------------+---------------+------+--------------------------------+<br>
<br>
<br>
I don't have a supercontig layer, since I am
faking contigs from assembled sequences for
testing purposes. I think I had that discussed
over this mailing list as well and was told that
the API code should be able to deal with a
contig-chromosome setup. Anything suspicious
here?<br>
</div>
</blockquote>
I'm not sure that this is the problem but you should
change the rank of your contig, set it to 2. The API
might be looking for the coordinate system with the
rank 2 and fails to find it at the moment.</div>
<div><br>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">
<blockquote cite="mid:8721613D-12DB-4AA8-8B36-141F116838B6@sanger.ac.uk" type="cite">
<div>
<div>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-western">
########## <br>
2) Unigene <br>
########## <br>
<br>
This one really bothers me <span class="moz-smiley-s3" title=";)"></span>
I think everything is set up
correctly (downloaded the unigene
file, header seems to comply with
the reference formatting in Blast.pm
etc), bit I cannot for the life of
me get it to work. Specifically, I
am trying to use ncbi blast and the
command just looks off - seems like
it tries to do a mix of Wublast and
Ncbi blast (works fine with Uniprot
though - so perhaps something with
the BlastGenscanDna module?). <br>
<br>
Running job 1791 <br>
Module is BlastGenscanDNA <br>
Input id is
contig:vchicken_test:10_68:1:50000:1
<br>
Analysis is unigene <br>
Files are
/data2/projects/annotation/EnsEMBL/chicken/output//unigene/0/contig:vchicken_test:10_114:1:50000:1.unigene.55.retry2.out
/data2/projects/annotation/EnsEMBL/chicken/output//unigene/0/contig:vchicken_test:10_114:1:50000:1.unige$
<br>
<br>
-------------------- WARNING
---------------------- <br>
MSG: Error running Blast cmd
</usr/bin/blastall -d
/data2/projects/annotation/EnsEMBL/chicken/refseqs/unigene.fa
-i /tmp/seq.22305.24863.fa -cpus=1
2>&1 >
/tmp/unigene.fa.22305.5651.blast.out>.
Returned error 256 BLAST EXIT: '1',
SIGNA$ <br>
FILE: Analysis/Runnable/Blast.pm
LINE: 380 <br>
CALLED BY:
EnsEMBL/Analysis/Runnable.pm LINE:
729 <br>
Date (localtime) = Fri Aug 23
14:54:47 2013 <br>
Ensembl API version = 70 <br>
</div>
</div>
</blockquote>
Have you tried to run the command by
itself to see if it works? The error
message you have seems to be from the ncbi
blast program.</div>
<div>As the module dies the temporary file
containing your chicken sequence should
still exists. If not, you will need to
comment a line in the run method of
ensembl-analysis/modules/Bio/EnsEMBL/Analysis/Runnable.pm:</div>
<div><br>
</div>
<div> #$self->delete_files;</div>
<div><br>
</div>
<div>You probably need to change your
parameters in the analysis table of your
reference database. We use WU blast at the
moment.</div>
<div><br>
</div>
<div>Also, the parameters for blast should
be "-cpus 1 -hitdist 40" instead of "<span style="font-family: -moz-fixed;
font-size: 12px; ">-cpus => 1,
-hitdist => 40"</span></div>
<div><br>
</div>
<div>Regards</div>
<div>Thibaut</div>
<div><br>
</div>
</div>
</blockquote>
I think the problem is that the blastall string
is mal-formatted. It should be<br>
<br>
blastall -i input.fasta -d database -p blastn <br>
<br>
So it failed to determine which blast program to
use. Interestingly, it works fine for
protein-protein blast, but fails in this
protein-dna configuration. Hence my question
whether this may be a problem in the
BlastGenscanDna module. I can try wublast also,
but I think I had serious trouble getting that
to work. Are you guys calling your executables
wublastp, wublastn etc? Because the only thing I
could find was blastp, blastn etc. I assume this
would still work if I specify these binary names
in the configs..? Gave up at some point because
it keep whining about something, so went the
ncbi route..<br>
</div>
</blockquote>
Maybe blast look at the database to know which type
of search it will do by default.<br>
For the moment you need to change the parameters and
add '-p blastn' and it should work.</div>
<div><br>
</div>
<div>If your blastn is similar to blastall -p blastn
then you can change your program to be blastn and
you don't need to add '-p blastn' to your parameters<br>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000"> <br>
Oh and thanks for pointing out the parameter
issue, I actually took those from the
documentation, sooo... ;) But will update my
scripts. <br>
</div>
</blockquote>
Unfortunately the documentation available is a bit
old and updating it take a lot of time.</div>
<div><br>
</div>
<div>Regards</div>
<div>Thibaut</div>
<div><br>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000"> <br>
All the best,<br>
<br>
Marc<br>
<br>
<br>
<blockquote cite="mid:8721613D-12DB-4AA8-8B36-141F116838B6@sanger.ac.uk" type="cite">
<div>
<div>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-western">
<br>
And here the config for the unigene
search: <br>
<br>
[unigene] <br>
db=unigene <br>
db_file=/data2/projects/annotation/EnsEMBL/chicken/refseqs/unigene.fa
<br>
program=blastall <br>
program_file=blastall <br>
parameters=-cpus => 1, -hitdist
=> 40 <br>
module=BlastGenscanDNA <br>
input_id_type=CONTIG <br>
<br>
(Blast.pm is configured to use
'ncbi' as default type, so unigene
should inherit that, no?)<br>
<br>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-western">
<br>
<div class="moz-txt-sig"><span class="moz-txt-tag">-- <br>
</span>Marc P. Hoeppner, PhD <br>
Department of Medical Biochemistry
and Microbiology <br>
Uppsala University, Sweden <br>
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:marc.hoeppner@imbim.uu.se">marc.hoeppner@imbim.uu.se</a> <br>
<br>
</div>
</div>
</div>
_______________________________________________<br>
Dev mailing list <a moz-do-not-send="true" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and
subscribe/unsubscribe info: <a moz-do-not-send="true" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a moz-do-not-send="true" href="http://www.ensembl.info/">http://www.ensembl.info/</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Dev mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</div>
_______________________________________________<br>
Dev mailing list <a moz-do-not-send="true" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>
Posting guidelines and subscribe/unsubscribe info:
<a moz-do-not-send="true" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>
Ensembl Blog: <a moz-do-not-send="true" href="http://www.ensembl.info/">http://www.ensembl.info/</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Dev mailing list <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Dev@ensembl.org">Dev@ensembl.org</a>
Posting guidelines and subscribe/unsubscribe info: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a>
Ensembl Blog: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.ensembl.info/">http://www.ensembl.info/</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
</div>
<br>
-- The Wellcome Trust Sanger Institute is operated by Genome
Research Limited, a charity registered in England with number
1021457 and a company registered in England with number 2742969,
whose registered office is 215 Euston Road, London, NW1 2BE. <br>
</blockquote>
<br>
</div>
_______________________________________________<br>Dev mailing list <a href="mailto:Dev@ensembl.org">Dev@ensembl.org</a><br>Posting guidelines and subscribe/unsubscribe info: <a href="http://lists.ensembl.org/mailman/listinfo/dev">http://lists.ensembl.org/mailman/listinfo/dev</a><br>Ensembl Blog: <a href="http://www.ensembl.info/">http://www.ensembl.info/</a><br></blockquote></div><br></div></body></html>