Hi,<br> As described previously, I'm trying to run the low coverage annotation pipeline for our Illumina GAII sequenced fish genome (~800m).<br> The doc low_coverage_gen_build.txt tells me to prepare my own compara db, so I go to encembl-compara.<br>
For my fish genome, I have ~75k scaffolds (length >= 200bp, N50 ~1M), among which 2600 scaffolds are longer than 1000bp. my ref genome is stickleback, and I followed the README-pairaligner doc.<br> As the ref genome has ~2000 chunks (size 1M), there will be 2000 X 75000 = 150M pairaligner jobs. too many to run in my institute.<br>
here are my questions:<br> 1. should I only use these scaffolds longer than 1000bp? <br clear="all"> 2. am I followed the right doc? Which doc should I read to produce such a alignment that: 'each bp in the target genome should be represented <br>
at most once' (cited from low_coverage_gene_build.txt). I don't quite understand the README-2xalignment and README-low-coverage-genome-aligner.<br><br>Thank you<br><br>Best reguards<br><br>-- <br>Zhang Di<br>