TY - JOUR T1 - Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes A1 - Lin, Michael F. A1 - Deoras, Ameya N. A1 - Rasmussen, Matthew D. A1 - Kellis, Manolis Y1 - 2008/04/18 N2 - Author SummaryComparing the genomes of related species is a powerful approach to the discovery of functional elements such as protein-coding genes. Theoretically, using more species should lead to more discovery power. Many questions remain, however, surrounding the optimal choice of species to compare and how to best use multi-species alignments. It is even possible that practical limitations in the sequencing, assembly, and alignment of genomes could effectively negate the benefit of using more species. Here, we used 12 complete fly genomes to study a variety of metrics used to identify protein-coding genes, including methods that analyze only the genome of interest and comparative methods that examine evolutionary signatures in genome alignments. We found that species over a surprisingly broad range of phylogenetic distances were effective in comparative analyses, and that discovery power continued to scale with each additional species without apparent saturation. We also examined whether comparative methods systematically miss genes considered fast-evolving, and studied how performance is influenced by genome alignment strategies. Our results can help guide species selection for future comparative studies and provide methodological guidance for a variety of gene identification tasks, including the design of future de novo gene predictors and the search for unusual gene structures. JF - PLOS Computational Biology JA - PLOS Computational Biology VL - 4 IS - 4 UR - https://doi.org/10.1371/journal.pcbi.1000067 SP - e1000067 EP - PB - Public Library of Science M3 - doi:10.1371/journal.pcbi.1000067 ER -