<p>I think perhaps the one you were after where you do indeed pass a biotype is fetch_by_biotype from a gene adaptor.</p>

<p>Dan</p>

<div class="gmail_quote">On Jul 13, 2012 7:26 PM, "Javier Herrero" <<a href="mailto:jherrero@ebi.ac.uk">jherrero@ebi.ac.uk</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Chris<br>

<br>

You want to use<br>

<br>

if ($gene->biotype() eq "protein_coding") {}<br>

<br>

instead. With $gene->biotype("protein_<u></u>coding") you are setting the biotype of that gene.<br>

<br>

Kind regards<br>

<br>

Javier<br>

<br>

On 13/07/12 18:01, Christopher Kelly wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Having tried this, I've found that filtering for gene->biotype("protein_coding"<u></u>), surprisingly, does not remove ncRNAs; The list I fetched is still packed full of ncRNAs.<br>

<br>

Could anyone provide additional insight?<br>

<br>

Cheers,<br>

Chris<br>

On 2012-07-10, at 11:12 AM, José Afonso Guerra Assunção wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Both genes and transcripts have a method called biotype.<br>

you can select for "protein_coding"...<br>

<br>

HTH,<br>

Jose<br>

<br>

On Tue, Jul 10, 2012 at 7:06 PM, Christopher Kelly <<a href="mailto:cpjkelly@gmail.com" target="_blank">cpjkelly@gmail.com</a>> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello all,<br>

<br>

I am using a script to fetch all Members associated with a given genome db id, and fetch the protein family (if any)each member belongs to.<br>

<br>

This works fine. However, the comparative analysis program that uses the output of this script is producing less reliable results than would be desirable, since the human annotation seems to contain many ncRNA genes whose orthologues have not yet been identified in the annotations of many other species.<br>


<br>

In order to improve the accuracy of the analysis program, I would like to be able to filter out all ncRNA genes from my script output.<br>

<br>

The script usually fetches from ENSEMGLGENE. I have tried fetching from ENSEMBLPEP in order to filter out ncRNAs but this still reduces the quality of the output for analysis purposes.<br>

<br>

Having sifted through a good deal of the Ensembl and Ensembl Compara Doxygen documentation, I have yet to find an accurate method that would do this for me.<br>

<br>

Is there an accurate API function/method for filtering ncRNA-genes from a list of member or gene objects?<br>

<br>

<br>

Thanks in advance,<br>

<br>

Chris Kelly<br>

<br>

<br>

<br>

Here is the relevant section of the script code:<br>

<br>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<u></u>~~<br>

<br>

$registry->load_registry_from_<u></u>db(<br>

    -host => '<a href="http://ensembldb.ensembl.org" target="_blank">ensembldb.ensembl.org</a>',<br>

    -user => 'anonymous',<br>

    -port => '5306'); # add -verbose => '1' for more verbose output<br>

                      # add -db_version => 'version' for a specific ensembl db version, otherwise script will search most current version<br>

<br>

#get member adaptor<br>

my $member_adaptor = $registry->get_adaptor('Multi'<u></u>, 'compara', 'Member');<br>

<br>

my $family_adaptor = $registry->get_adaptor('Multi'<u></u>, 'compara', 'Family');<br>

<br>

my @member_list;<br>

<br>

#outside for-loop to iterate body of program for each id specified on the command line<br>

foreach my $species_genome_db_id (@genome_db_id_list){<br>

<br>

    @member_list = ();<br>

    my $file_path = "$current_directory"."/"."$<u></u>species_genome_db_id";<br>

    mkdir $file_path, 0777;<br>

<br>

    #Fetch all members of given species (specified by genome_db_id) from given source.<br>

    #Source options are: 'ENSEMBLGENE', 'ENSEMBLPEP', 'Uniprot/SPTREMBL',<br>

    #'Uniprot/SWISSPROT', 'ENSEMBLTRANS', 'EXTERNALCDS'.<br>

    #Each species has a unique genome_db_id in the current ensembl compara db version.<br>

    sub get_members_list {<br>

<br>

        my($source, $genome_db_id, @members) = @_;<br>

<br>

        #fetch members list - returns listref of members<br>

        my $new_members_ref = $member_adaptor->fetch_all_by_<u></u>source_genome_db_id("$source", "$genome_db_id");<br>

<br>

        #dereference members list ref<br>

        my @new_members = @$new_members_ref;<br>

<br>

        #join new_members list to the list of members<br>

        push(@members, @new_members);<br>

        @members;<br>

    }<br>

<br>

    #<br>

    #Get members from all sources for the given genome_db_id (denoting a specific species)<br>

    #<br>

    #@member_list = get_members_list('ENSEMBLGENE'<u></u>, $species_genome_db_id, @member_list);<br>

    #print "ENSEMBLGENE members fetched\n";<br>

    @member_list = get_members_list('ENSEMBLPEP', $species_genome_db_id, @member_list);<br>

    print "ENSEMBLPEP members fetched\n";<br>

    #@member_list = get_members_list('Uniprot/<u></u>SPTREMBL', $species_genome_db_id, @member_list);<br>

    #print "Uniprot/SPTREMBL members fetched\n";<br>

    #@member_list = get_members_list('Uniprot/<u></u>SWISSPROT', $species_genome_db_id, @member_list);<br>

    #print "Uniprot/SWISSPROT members fetched\n";<br>

    #@member_list = get_members_list('<u></u>ENSEMBLTRANS', $species_genome_db_id, @member_list);<br>

    #print "ENSEMBLTRANS members fetched\n";<br>

    #@member_list = get_members_list('EXTERNALCDS'<u></u>, $species_genome_db_id, @member_list);<br>

    #print "EXTERNALCDS members fetched\n";<br>

}<br>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<u></u>~~~~~~~~~~~~<br>

______________________________<u></u>_________________<br>

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/<u></u>mailman/listinfo/dev</a><br>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

</blockquote>

______________________________<u></u>_________________<br>

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/<u></u>mailman/listinfo/dev</a><br>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

</blockquote>

<br>

______________________________<u></u>_________________<br>

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/<u></u>mailman/listinfo/dev</a><br>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

<br>

</blockquote>

<br>

<br>

<br>

-- <br>

Javier Herrero, PhD<br>

Ensembl Coordinator and Ensembl Compara Project Leader<br>

European Bioinformatics Institute (EMBL-EBI)<br>

Wellcome Trust Genome Campus, Hinxton<br>

Cambridge - CB10 1SD - UK<br>

<br>

<br>

______________________________<u></u>_________________<br>

Dev mailing list    <a href="mailto:Dev@ensembl.org" target="_blank">Dev@ensembl.org</a><br>

List admin (including subscribe/unsubscribe): <a href="http://lists.ensembl.org/mailman/listinfo/dev" target="_blank">http://lists.ensembl.org/<u></u>mailman/listinfo/dev</a><br>

Ensembl Blog: <a href="http://www.ensembl.info/" target="_blank">http://www.ensembl.info/</a><br>

<br>

</blockquote></div>