The authors have declared that no competing interests exist.
Each year, the International Society for Computational Biology (ISCB;
The recipients were chosen by the ISCB's awards committee chaired by Alfonso Valencia at the CNIO (Spanish National Cancer Research Centre) in Madrid. The winners will receive their awards at the ISCB's annual meeting, where they will also deliver keynote talks. This meeting, ISMB/ECCB 2011
If computational biology seems challenging in the second decade of the 21st century, spare a thought for those who pioneered the discipline in the 1980s. Michael Ashburner (
Ashburner began his career with a degree in genetics from the University of Cambridge in 1964. He stayed on to do a PhD, studying
In the late 1970s, Ashburner turned his attention to the study of the
Two people came to his aid. The first was Walter Bodmer, director of the Imperial Cancer Research Fund, who gave Ashburner the use of a DEC computer with access to the early network. “We could access this machine by dial-up and do some analysis,” he says. The second was Doug Brutlag at Stanford University, who was developing MOLGEN, an early bioinformatics system, which he allowed Ashburner to access.
That presented a significant obstacle, however. Getting a computer in the United Kingdom to speak to one in Stanford was not straightforward. Today, everybody uses the Internet, defined by the TCP/IP protocol. But in the early ‘80 s, the UK and United States used different systems. The US was pioneering TCP/IP while the UK had a standard called the Coloured Book protocols. “The only place that had an interface between the two protocols was University College, London, and they were very helpful,” says Ashburner, “giving us 5 kb of disk space.”
The process of connecting to Stanford was far from simple. “The way you did it was to dial up your local packet switching exchange at the Post Office and connect to the Rutherford Appleton Laboratory. You then typed in some code which connected you to UCL where you could use TCP/IP,” he says. The signal was routed via Goonhilly satellite station in Cornwall to Carnegie Mellon University and from there to Stanford. “I had a dumb terminal, that is a box with no memory, so everything had to be captured by a printer in parallel.” Ashburner was far from deterred, however.
At about that time, the European Molecular Biology Laboratory (EMBL) in Heidelberg and GenBank in the US released the first nucleotide sequence libraries in quick succession. Using his network access, Ashburner and his colleagues, collaboratively with MOLGEN, set up one of the first bulletin boards, called BioNet, to keep people informed of changes to the library and to software. “This became well used and things evolved from there,” he says.
As the field of bioinformatics grew, the need for an institution to house the data and conduct research increased. So in 1992, the EMBL decided to set up an institute of bioinformatics that would house this library and carry out research. This organisation became known as the European Bioinformatics Institute, based in Hinxton, UK, with Ashburner and John Sulston having led the UK bid to host it. “I was persuaded to become the first program coordinator and took half-time leave from Cambridge to do that,” he says. He eventually took over as joint-director, a post he held until 2001. “At first, the finances were sticky and the politics were horrendous. But it has since gone from strength to strength,” he says.
At the same time, Ashburner continued his interest in
So in 1989 he proposed that the community set up an electronic database to take over the role of the printed one. In 1992, the NIH funded the project that became known as FlyBase, one of the first genetic and now genomic databases.
FlyBase was a crucial factor in triggering Ashburner's interest in a structured, controlled vocabulary, a formal representation of knowledge about genes and gene products. He began to define terms for gene products by their biological processes, such as wing development, and then defined the data structure in which these terms were related to each other. “It occurred to me that if you were able to do this for several model species, you'd have a fantastic tool,” he says.
But this insight initially met with little interest. “My first presentation, at ISMB in Greece in 1997, went down like a lead balloon,” he recalls. Eventually, he and three like-minded colleagues settled the matter in a bar at the Montreal ISMB in 1998. Together, they decided to set up a cross-species ontology to be used by the
He went on to collaborate with Gerry Rubin and Craig Venter in sequencing the
“We're lucky to have such an inspirational figure in the community,” says Valencia. “This award has been well deserved for a number of years.”
In the spring of 1997, Olga Troyanskaya (
And so began the career of one of the most promising young researchers in bioinformatics, and a deserving winner of this year's Overton Prize. “She is one of these forces of nature, full of energy,” says Alfonso Valencia, chair of the ISCB awards committee.
Troyanskaya herself talks with infectious enthusiasm about her work. “I've always been fascinated by the problems of biology,” she says. “I was just better at computer science and math than the wet lab research. And it seemed to me that there had to be a lot you could contribute with computer science that you couldn't do with experimental techniques alone.”
From the University of Richmond, Troyanskaya moved to Stanford University to complete a PhD in biomedical informatics, under the supervision of Russ Altman, a bioinformatician, and David Botstein, a geneticist. “I wanted a setup that was close to real biological problems, and I got exactly that. I learned a great deal from both of them,” she says.
In 2003, she moved to Princeton University as an assistant professor in the Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics. “I am fortunate that the computer science department appreciates the impact of computing in biology, and that I have many wonderful colleagues at both the department and in the Institute. I found several amazing collaborators, and this allowed me to begin a number of interesting projects.”
One of the key problems she focuses on is making better use of the vast but unwieldy biological datasets in databases around the world. “So instead of focusing on one study, we can take the entirety of published data. That allows you to ask very specific questions in a data-driven way and to develop novel biological hypotheses,” she says.
An important goal is to predict the function of genes or proteins. There have been many experimental approaches to determine what genes do and how they are controlled inside the cell. But this work tends to produce datasets that are large and noisy. Troyanskaya's approach is to develop new ways for extracting useful information from these datasets using techniques from computer science such as machine learning and data mining.
“Computation by itself is often not enough to discover new biology but it can direct experimental work,” she says. And she has set up a wet lab to help test and validate the hypotheses that the computer science helps generate. In 2009, for example, she used this approach to identify 109 new proteins involved in mitochondrial biogenesis in yeast.
This combined approach is one of the things that sets Troyanskaya apart, says Valencia. “She is one of the first to have come from the computational side and then moved into the experimental area to combine both,” he says.
Understanding the function of individual genes is only a small part of a much bigger story. Many genes and proteins play multiple roles within a cell as parts of various networks of biological processes. Mapping out these networks and understanding how they work and interact with each other is yet another strand of her research. “She has made important contributions to systems biology,” says Valencia.
The process of evaluating and validating computational predictions is an area requiring a broad collaboration to develop standards and methods that can be used to achieve a consensus about the results. To this end, Troyanskaya is collaborating with the curators of model organism databases and members of the Gene Ontology Consortium.
Another problem that many researchers face is handling the data avalanches currently being generated. So Troyanskaya, in collaboration with Princeton colleagues Kai Li and Moses Charikar, is looking at ways to better search and visualise these huge datasets, something that is challenging because of high noise levels and the enormous volume of the data. “We are developing better ways to do this,” she says.
The awards committee was also impressed by Troyanskaya's service for the community. She is involved in the Society's two official journals,
And there is surely more to come. Troyanskaya points to numerous questions that are driving her research forward. She wants to know, for example, how we can predict which genes are involved in kidney disease, to understand their function and their clinical role on a molecular level. She works on these questions in close collaboration with experimental researchers, such as Matthias Kretzler and his group from the University of Michigan, Ann Arbor. And she is passionate about finding ways to ask questions in a data-driven way, not just in a knowledge-driven way that relies on what we already know about biology. “These are the questions that I'm really interested in,” she says. “And we really haven't yet harnessed the full potential of our data collections.”
The full conference agenda and registration information for ISMB/ECCB 2011, where these ISCB award winners, along with four other distinguished Keynote lecturers, can be found on the conference Web site at
For a review of past ISCB award winners, please see
BJ Morrison McKay is the Executive Officer of the International Society for Computational Biology.